10 1 1 677 4290

Download as pdf or txt
Download as pdf or txt
You are on page 1of 166

(IJACSA) International Journal of Advanced Computer Science and Applications,

Vol. 5, No. 12, 2014

Editorial Preface
From the Desk of Managing Editor…
It is our pleasure to present to you the December 2014 Issue of International Journal of Advanced Computer Science
and Applications.

Today, it is incredible to consider that in 1969 men landed on the moon using a computer with a 32-kilobyte memory
that was only programmable by the use of punch cards. In 1973, Astronaut Alan Shepherd participated in the first
computer "hack" while orbiting the moon in his landing vehicle, as two programmers back on Earth attempted to "hack"
into the duplicate computer, to find a way for Shepherd to convince his computer that a catastrophe requiring a
mission abort was not happening; the successful hack took 45 minutes to accomplish, and Shepherd went on to hit his
golf ball on the moon. Today, the average computer sitting on the desk of a suburban home office has more
computing power than the entire U.S. space program that put humans on another world!!

Computer science has affected the human condition in many radical ways. Throughout its history, its developers have
striven to make calculation and computation easier, as well as to offer new means by which the other sciences can be
advanced. Modern massively-paralleled super-computers help scientists with previously unfeasible problems such as
fluid dynamics, complex function convergence, finite element analysis and real-time weather dynamics.

At IJACSA we believe in spreading the subject knowledge with effectiveness in all classes of audience. Nevertheless,
the promise of increased engagement requires that we consider how this might be accomplished, delivering up-to-
date and authoritative coverage of advanced computer science and applications.

Throughout our archives, new ideas and technologies have been welcomed, carefully critiqued, and discarded or
accepted by qualified reviewers and associate editors. Our efforts to improve the quality of the articles published and
expand their reach to the interested audience will continue, and these efforts will require critical minds and careful
consideration to assess the quality, relevance, and readability of individual articles.

To summarise, the journal has offered its readership thought provoking theoretical, philosophical, and empirical ideas
from some of the finest minds worldwide. We thank all our readers for their continued support and goodwill for IJACSA.
We will keep you posted on updates about the new programmes launched in collaboration.

Lastly, we would like to express our gratitude to all authors, whose research results have been published in our journal, as
well as our referees for their in-depth evaluations.

We hope that materials contained in this volume will satisfy your expectations and entice you to submit your own
contributions in upcoming issues of IJACSA

Thank you for Sharing Wisdom!

Managing Editor
IJACSA
Volume 5 Issue 12 December 2014
ISSN 2156-5570 (Online)
ISSN 2158-107X (Print)
©2013 The Science and Information (SAI) Organization

(i)

www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 5, No. 12, 2014

Editorial Board
Editor-in-Chief

Dr. Kohei Arai - Saga University


Domains of Research: Human-Computer Interaction, Networking, Information Retrievals, Optimization
Theory, Modelling and Simulation, Satellite Remote Sensing, Computer Vision, Decision Making
Methodology

Associate Editors

Chao-Tung Yang
Department of Computer Science, Tunghai University, Taiwan
Domain of Research: Cloud Computing

Elena SCUTELNICU
“Dunarea de Jos" University of Galati, Romania
Domain of Research: e-Learning Tools, Modelling and Simulation of Welding Processes

Krassen Stefanov
Professor at Sofia University St. Kliment Ohridski, Bulgaria
Domains of Research: Digital Libraries

Maria-Angeles Grado-Caffaro
Scientific Consultant, Italy
Domain of Research: Sensing and Sensor Networks

Mohd Helmy Abd Wahab


Universiti Tun Hussein Onn Malaysia
Domain of Research: Data Mining, Database, Web-based Application, Mobile Computing

T. V. Prasad
Lingaya's University, India
Domain of Research: Bioinformatics, Natural Language Processing, Image Processing, Robotics, Knowledge
Representation

(ii)

www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 5, No. 12, 2014

Reviewer Board Members


 Abbas Karimi  Aris Skander
Islamic Azad University Arak Branch Constantine University
 Abdel-Hameed Badawy  Ashraf Mohammed Iqbal
Arkansas Tech University Dalhousie University and Capital Health
 Abdelghni Lakehal  Ashok Matani
Fsdm Sidi Mohammed Ben Abdellah University  Ashraf Owis
 Abeer Elkorny Cairo University
Faculty of computers and information, Cairo  Asoke Nath
University St. Xaviers College
 ADEMOLA ADESINA  Ayad Ismaeel
University of the Western Cape, South Africa Department of Information Systems Engineering-
 Ahmed Boutejdar Technical Engineering College-Erbil / Hawler
 Dr. Ahmed Nabih Zaki Rashed Polytechnic University, Erbil-Kurdistan Region- IRAQ
Menoufia University  Babatunde Opeoluwa Akinkunmi
 Aderemi A. Atayero University of Ibadan
Covenant University  Badre Bossoufi
 Akbar Hossin University of Liege
 Akram Belghith  Basil Hamed
University Of California, San Diego Islamic University of Gaza
 Albert S  Bharti Waman Gawali
Kongu Engineering College Department of Computer Science & information
 Alcinia Zita Sampaio  Bhanu Prasad Pinnamaneni
Technical University of Lisbon Rajalakshmi Engineering College; Matrix Vision
 Ali Ismail Awad GmbH
Luleå University of Technology  Bilian Song
 Alexandre Bouënard LinkedIn
 Amitava Biswas  Brahim Raouyane
Cisco Systems FSAC
 Anand Nayyar  Brij Gupta
KCL Institute of Management and Technology, University of New Brunswick
Jalandhar  Bright Keswani
 Andi Wahju Rahardjo Emanuel Suresh Gyan Vihar University, Jaipur (Rajasthan)
Maranatha Christian University, INDONESIA INDIA
 Constantin Filote
 Anirban Sarkar
Stefan cel Mare University of Suceava
National Institute of Technology, Durgapur, India
 Constantin Popescu
 Anuranjan misra Department of Mathematics and Computer
Bhagwant Institute of Technology, Ghaziabad, India Science, University of Oradea
 Andrews Samraj  Chandrashekhar Meshram
Mahendra Engineering College Chhattisgarh Swami Vivekananda Technical
University
 Arash Habibi Lashakri
 Chao Wang
University Technology Malaysia (UTM)

(iii)

www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 5, No. 12, 2014

 Chi-Hua Chen GCET


National Chiao-Tung University  gamil Abdel Azim
 Ciprian Dobre Associate prof - Suez Canal University
University Politehnica of Bucharest
 Ganesh Sahoo
 Chien-Pheg Ho
RMRIMS
Information and Communications Research
Laboratories, Industrial Technology Research  Gaurav Kumar
Institute of Taiwan Manav Bharti University, Solan Himachal Pradesh
 Charlie Obimbo  Ghalem Belalem
University of Guelph University of Oran (Es Senia)
 Chao-Tung Yang  Giri Babu
Department of Computer Science, Tunghai Indian Space Research Organisation
University  Giacomo Veneri
 Dana PETCU University of Siena
West University of Timisoara  Giri Babu
 Deepak Garg Indian Space Research Organisation
Thapar University  Gerard Dumancas
 Dewi Nasien Oklahoma Medical Research Foundation
Universiti Teknologi Malaysia  Georgios Galatas
 Dheyaa Kadhim  George Mastorakis
University of Baghdad Technological Educational Institute of Crete
 Dong-Han Ham  Gunaseelan Devaraj
Chonnam National University Jazan University, Kingdom of Saudi Arabia
 Dragana Becejski-Vujaklija  Gavril Grebenisan
University of Belgrade, Faculty of organizational University of Oradea
sciences  Hadj Tadjine
 Driss EL OUADGHIRI IAV GmbH
 Duck Hee Lee  Hamid Mukhtar
Medical Engineering R&D Center/Asan Institute for National University of Sciences and Technology
Life Sciences/Asan Medical Center  Hamid Alinejad-Rokny
 Dr. Santosh Kumar University of Newcastle
Graphic Era University, Dehradun, India  Harco Leslie Hendric Spits Warnars
 Elena Camossi Budi LUhur University
Joint Research Centre  Harish Garg
 Eui Lee Thapar University Patiala
 Elena SCUTELNICU  Hamez l. El Shekh Ahmed
"Dunarea de Jos" University of Galati Pure mathematics
 Firkhan Ali Hamid Ali  Hesham Ibrahim
UTHM Chemical Engineering Department, Faculty of
 Fokrul Alom Mazarbhuiya Engineering, Al-Mergheb University
 Dr. Himanshu Aggarwal
King Khalid University
Punjabi University, India
 Frank Ibikunle
 Huda K. AL-Jobori
Covenant University Ahlia University
 Fu-Chien Kao  Iwan Setyawan
Da-Y eh University Satya Wacana Christian University
 Faris Al-Salem

(iv)

www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 5, No. 12, 2014

 Dr. Jamaiah Haji Yahaya  Lokesh Sharma


Northern University of Malaysia (UUM), Malaysia Indian Council of Medical Research
 James Coleman  Long Chen
Edge Hill University Qualcomm Incorporated
 Jim Wang  M. Reza Mashinchi
The State University of New York at Buffalo,  M. Tariq Banday
Buffalo, NY University of Kashmir
 John Salin  MAMTA BAHETI
George Washington University SNJBS KBJ COLLEGE OF ENGINEERING, CHANDWAD,
 Jyoti Chaudary NASHIK, M.S. INDIA
High performance computing research lab  Mazin Al-Hakeem
 Jatinderkumar R. Saini Research and Development Directorate - Iraqi
S.P.College of Engineering, Gujarat Ministry of Higher Education and Research
 K Ramani  Md Rana
K.S.Rangasamy College of Technology, University of Sydney
Tiruchengode  Miriampally Venkata Raghavendera
 K V.L.N.Acharyulu Adama Science & Technology University, Ethiopia
Bapatla Engineering college  Mirjana Popvic
 Kashif Nisar School of Electrical Engineering, Belgrade University
Universiti Utara Malaysia  Manas deep
 Kayhan Zrar Ghafoor Masters in Cyber Law & Information Security
University Technology Malaysia  Manpreet Singh Manna
 Kitimaporn Choochote
SLIET University, Govt. of India
Prince of Songkla University, Phuket Campus
 Manuj Darbari
 Kunal Patel
BBD University
Ingenuity Systems, USA
 Krasimir Yordzhev  Md. Zia Ur Rahman
South-West University, Faculty of Mathematics and Narasaraopeta Engg. College, Narasaraopeta
Natural Sciences, Blagoevgrad, Bulgaria  Messaouda AZZOUZI
 Krassen Stefanov Ziane AChour University of Djelfa
Professor at Sofia University St. Kliment Ohridski
 Dr. Michael Watts
 Labib Francis Gergis
University of Adelaide
Misr Academy for Engineering and Technology
 Milena Bogdanovic
 Lai Khin Wee
University of Nis, Teacher Training Faculty in Vranje
Biomedical Engineering Department, University
 Miroslav Baca
Malaya
University of Zagreb, Faculty of organization and
 Lazar Stosic
informatics / Center for biomet
Collegefor professional studies educators Aleksinac,
Serbia  Mohamed Ali Mahjoub
 Lijian Sun Preparatory Institute of Engineer of Monastir
Chinese Academy of Surveying and Mapping, China  Mohamed El-Sayed
 Leandors Maglaras Faculty of Science, Fayoum University, Egypt
 Leon Abdillah  Mohammad Yamin
Bina Darma University  Mohammad Ali Badamchizadeh
 Ljubomir Jerinic University of Tabriz
University of Novi Sad, Faculty of Sciences,  Mohamed Najeh Lakhoua
Department of Mathematics and Computer Science ESTI, University of Carthage

(v)

www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 5, No. 12, 2014

 Mohammad Alomari Aswan University


Applied Science University  Ousmane Thiare
 Mohammad Kaiser Associate Professor University Gaston Berger of
Institute of Information Technology Saint-Louis SENEGAL
 Mohammed Al-Shabi  Omaima Al-Allaf
Assistant Prof. Assistant Professor
 Mohammed Sadgal  Paresh V Virparia
 Mourad Amad Sardar Patel University
Laboratory LAMOS, Bejaia University  Dr. Poonam Garg
 Mohammed Ali Hussain Institute of Management Technology, Ghaziabad
Sri Sai Madhavi Institute of Science & Technology  Professor Ajantha Herath
 Mohd Helmy Abd Wahab  Prabhat K Mahanti
Universiti Tun Hussein Onn Malaysia UNIVERSITY OF NEW BRUNSWICK
 Mueen Uddin  Qufeng Qiao
Universiti Teknologi Malaysia UTM University of Virginia
 Mona Elshinawy  Rachid Saadane
Howard University EE departement EHTP
 Maria-Angeles Grado-Caffaro  raed Kanaan
Scientific Consultant Amman Arab University
 Raja boddu
 Mehdi Bahrami
LENORA COLLEGE OF ENGINEERNG
University of California, Merced
 Ravisankar Hari
 Miriampally Venkata Raghavendra
SENIOR SCIENTIST, CTRI, RAJAHMUNDRY
Adama Science & Technology University, Ethiopia
 Raghuraj Singh
 Murthy Dasika
 Rajesh Kumar
SreeNidhi Institute of Science and Technology
National University of Singapore
 Mostafa Ezziyyani
 Rakesh Balabantaray
FSTT
IIIT Bhubaneswar
 Marcellin Julius Nkenlifack
 RashadAl-Jawfi
University of Dschang
Ibb university
 Natarajan Subramanyam
 Rashid Sheikh
PES Institute of Technology
Shri Venkteshwar Institute of Technology , Indore
 Noura Aknin
 Ravi Prakash
University Abdelamlek Essaadi
University of Mumbai
 Nidhi Arora
M.C.A. Institute, Ganpat University  Rawya Rizk
 Nazeeruddin Mohammad Port Said University
Prince Mohammad Bin Fahd University  Reshmy Krishnan
 Najib Kofahi Muscat College affiliated to stirling University.U
Yarmouk University  Ricardo Vardasca
 NEERAJ SHUKLA
Faculty of Engineering of University of Porto
ITM UNiversity, Gurgaon, (Haryana) Inida
 N.Ch. Iyengar  Ritaban Dutta
VIT University ISSL, CSIRO, Tasmaniia, Australia
 Om Sangwan  Rowayda Sadek
 Oliviu Matel  Ruchika Malhotra
Technical University of Cluj-Napoca Delhi Technoogical University
 Osama Omer  Saadi Slami
University of Djelfa
(vi)

www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 5, No. 12, 2014

 Sachin Kumar Agrawal GITAM University, Hyderabad


University of Limerick  Sumit Goyal
 Dr.Sagarmay Deb  Sumazly Sulaiman
University Lecturer, Central Queensland University, Institute of Space Science (ANGKASA), Universiti
Australia Kebangsaan Malaysia
 Said Ghoniemy  Sohail Jabb
Taif University Bahria University
 Sasan Adibi  Suhas J Manangi
Research In Motion (RIM) Microsoft
 Sérgio Ferreira  Suresh Sankaranarayanan
School of Education and Psychology, Portuguese Institut Teknologi Brunei
Catholic University  Susarla Sastry
 Sebastian Marius Rosu J.N.T.U., Kakinada
Special Telecommunications Service  Syed Ali
 Selem charfi SMI University Karachi Pakistan
University of Valenciennes and Hainaut Cambresis,  T C. Manjunath
France. HKBK College of Engg
 Seema Shah  T V Narayana Rao
Vidyalankar Institute of Technology Mumbai, Hyderabad Institute of Technology and
 Sengottuvelan P Management
Anna University, Chennai  T. V. Prasad
 Senol Piskin Lingaya's University
Istanbul Technical University, Informatics Institute  Taiwo Ayodele
 Seyed Hamidreza Mohades Kasaei Infonetmedia/University of Portsmouth
University of Isfahan  Tarek Gharib
 Shafiqul Abidin  THABET SLIMANI
G GS I P University College of Computer Science and Information
 Shahanawaj Ahamad Technology
The University of Al-Kharj  Totok R. Biyanto
 Shawkl Al-Dubaee Engineering Physics, ITS Surabaya
Assistant Professor  TOUATI YOUCEF
 Shriram Vasudevan Computer sce Lab LIASD - University of Paris 8
Amrita University  VINAYAK BAIRAGI
 Sherif Hussain Sinhgad Academy of engineering, Pune
Mansoura University  VISHNU MISHRA
 Siddhartha Jonnalagadda SVNIT, Surat
Mayo Clinic  Vitus S.W. Lam
 Sivakumar Poruran The University of Hong Kong
SKP ENGINEERING COLLEGE  Vuda SREENIVASARAO
 Sim-Hui Tee School of Computing and Electrical
Multimedia University Engineering,BAHIR DAR UNIVERSITY, BAHIR
DAR,ETHIOPA
 Simon Ewedafe
 Vaka MOHAN
Baze University
TRR COLLEGE OF ENGINEERING
 SUKUMAR SENTHILKUMAR
 Wei Wei
Universiti Sains Malaysia
 Xiaojing Xiang
 Slim Ben Saoud
AT&T Labs
 Sudarson Jena

(vii)

www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 5, No. 12, 2014

 YASSER ATTIA ALBAGORY UiTM (Terengganu) Dungun Campus


College of Computers and Information Technology,  ZENZO POLITE NCUBE
Taif University, Saudi Arabia North West University
 YI FEI WANG  ZHAO ZHANG
The University of British Columbia Deptment of EE, City University of Hong Kong
 Yilun Shang  ZHIXIN CHEN
University of Texas at San Antonio ILX Lightwave Corporation
 YU QI  ZLATKO STAPIC
Mesh Capital LLC University of Zagreb
 Zacchaeus Omogbadegun  Ziyue Xu
Covenant University  ZURAINI ISMAIL
 ZAIRI ISMAEL RIZMAN Universiti Teknologi Malaysia

(viii)

www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 5, No. 12, 2014

CONTENTS
Paper 1: Policy-Based Automation of Dynamique and Multipoint Virtual Private Network Simulation on OPNET Modeler
Authors: Ayoub BAHNASSE, Najib EL KAMOUN
PAGE 1 – 7

Paper 2: Development of Social Recommendation GIS for Tourist Spots


Authors: Tsukasa IKEDA, Kayoko YAMAMOTO
PAGE 8 – 21

Paper 3: A Feasibility Study on Porting the Community Land Model onto Accelerators Using Openacc
Authors: D. Wang, W. Wu, F. Winkler, W. Ding, O. Hernandez
PAGE 22 – 29

Paper 4: B2C E-Commerce Fact-Based Negotiation Using Big Data Analytics and Agent-Based Technologies
Authors: Hasan Al-Sakran
PAGE 30 – 37

Paper 5: A Study of MCA Learning Algorithm for Incident Signals Estimation


Authors: Rashid Ahmed, John N. Avaritsiotis
PAGE 38 – 44

Paper 6: Empirical Analyis of Public ICT Development Project Objectives in Hungary


Authors: Márta Aranyossy, András Nemeslaki, Adrienn Fekó
PAGE 45 – 54

Paper 7: Investigating the Idiotop Paratop Interaction in the Artificial Immune Networks
Authors: Hossam Meshref
PAGE 55 – 60

Paper 8: Using Object-Relational Mapping to Create the Distributed Databases in a Hybrid Cloud Infrastructure
Authors: Oleg Lukyanchikov, Evgeniy Pluzhnik, Simon Payain, Evgeny Nikulchev
PAGE 61 – 64

Paper 9: Definition of Tactile Interactions for a Multi-Criteria Selection in a Virtual World


Authors: Robin Vivian, David Bertolo, Jérôme Dinet
PAGE 65 – 71

Paper 10: Natural Language Processing and its Use in Education


Authors: Dr. Khaled M. Alhawiti
PAGE 72 – 76

Paper 11: An Upper Ontology for Benefits Management of Cloud Computing


Authors: Richard Greenwell*, Xiaodong Liu, Kevin Chalmers
PAGE 77 – 86

Paper 12: Determining the Efficient Structure of Feed-Forward Neural Network to Classify Breast Cancer Dataset
Authors: Ahmed Khalid, Noureldien A. Noureldien
PAGE 87 – 90

(ix)

www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 5, No. 12, 2014

Paper 13: Use of Geographic Information System Tools in Research on Neonatal Outcomes in a Maternity-School in Belo
Horizonte - Brazil
Authors: Juliano de S. Gaspar, Thabata Sá, Zilma S. N. Reis, Renato F. N. Júnior, Marcelo S. Júnior, Raphael R.
Gusmão
PAGE 91 – 96

Paper 14: A Grammatical Inference Sequential Mining Algorithm for Protein Fold Recognition
Authors: Taysir Hassan A. Soliman, Ahmed Sharaf Eldin, Marwa M. Ghareeb, Mohammed E. Marie
PAGE 97 – 106

Paper 15: Facial Expression Recognition Using 3D Convolutional Neural Network


Authors: Young-Hyen Byeon, Keun-Chang Kwak*
PAGE 107 – 112

Paper 16: Social Media in Azorean Organizations: Policies, Strategies and Perceptions
Authors: Nuno Filipe Cordeiro, Teresa Tiago, Flávio Tiago, Francisco Amaral
PAGE 113 – 119

Paper 17: Weighted Marking, Clique Structure and Node-Weighted Centrality to Predict Distribution Centre’s Location in
a Supply Chain Management
Authors: Amidu A. G. Akanmu, Frank Z. Wang, Fred A. Yamoah
PAGE 120 – 128

Paper 18: Social Networks’ Benefits, Privacy, and Identity Theft: KSA Case Study
Authors: Ahmad A. Al-Daraiseh, Afnan S. Al-Joudi, Hanan B. Al-Gahtani, Maha S. Al-Qahtani
PAGE 129 – 143

Paper 19: Software Architecture Reconstruction Method, a Survey


Authors: Zainab Nayyar, Nazish Rafique
PAGE 144 – 150

Paper 20: Zigbee Routing Opnet Simulation for a Wireless Sensors Network
Authors: ELKISSANI Kaoutar, Pr.Moughit Mohammed, Pr.Nasserdine Bouchaib
PAGE 151 – 154

(x)

www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 5, No. 12, 2014

Policy-Based Automation of Dynamique and


Multipoint Virtual Private Network Simulation on
OPNET Modeler
Ayoub BAHNASSE Najib EL KAMOUN
Department of physics Department of physics
University Chouaïb Doukali University Chouaïb Doukali
Faculty of science El Jadida Faculty of science El Jadida
EL Jadida, MOROCCO EL Jadida, MOROCCO

Abstract—The simulation of large-scale networks is a encapsulates various higher layer protocols and carry all traffic
challenging task especially if the network to simulate is the types (unicast, multicast and broadcast), but doesn’t provide
Dynamic Multipoint Virtual Private Network, it requires expert any authentication, integrity or confidentiality mechanism.
knowledge to properly configure its component technologies. The IPsec is a suite of protocols; Encapsulation Security Payload
study of these network architectures in a real environment is (ESP) and Authentication Header (AH), the first protocol
almost impossible because it requires a very large number of ensure the integrity, authentication and confidentiality of trade,
equipment, however, this task is feasible in a simulation the second provides integrity and authentication for data
environment like OPNET Modeler, provided to master both the exchange. IPsec operates in two modes, tunnel and transport
tool and the different architectures of the Dynamic Multipoint
mode, transport mode does not change the initial header it sits
Virtual Private Network.
between the network layer and transport of the OSI model, for
Several research studies have been conducted to automate the this mode, NAT can cause a problem of integrity [5], the tunnel
generation and simulation of complex networks under various mode replaces the original IP and encapsulates the entire
simulators, according to our research no work has dealt with the packet header.
Dynamic Multipoint Virtual Private Network. In this paper we OPNET Modeler is a software tool for network modeling
present a simulation model of the Dynamic and Multipoint and simulation. It allows to design and study communication of
Virtual Private network in OPNET Modeler, and a WEB-based
large scale networks, devices, protocols, and applications with
tool for project management on the same network.
great flexibility, it allows to study the system performance
Keywords—VPN; multipoint; Opnet; automation; DMVPN; under varying conditions, it also contributes to the
cloud; policy-based; WEB-BASED development of new protocols and architectures and their
optimization and the analysis of the impact of emerging
I. INTRODUCTION technologies, several books have been written to master
Dynamic multipoint Virtual Private Network “DMVPN” is OPNET Modeler environment and properly handle its
a solution for building dynamic Virtual Private Network associated objects [6, 7].
tunnels in an easy, scalable and dynamic manner supported on The process of setting up an Opnet project can be done by
Cisco IOS routers and Unix Operating System, DMVPN is several methods including: Drag drop objects to the
based on standard technologies such as Resolution Next Hop workspace;
Protocol (NHRP) and multipoint Generic Routing
Encapsulation (mGRE) for the dynamic creation of tunnels, Data Router configuration, to create the project based on
and Internet Protocol Security (IPsec) to ensure security of data the configuration files of routers such as Cisco and Juniper, to
exchanges between multiple sites, as well as routing protocols benefit from this feature the module Multi Vendor Import “
to route data optimally [1] [2], several scientific studies have MVI ” must be turned on from license management;
been conducted to study the effect of routing protocols on Non Extensible Markup Language “ XML ”, the required form
Broadcast Multi-Access networks (NBMA) [3] [4]. The HUB of the XML file to import to Opnet is specified in the
maintains in its NHRP cache, public and tunnel IP addresses of Document Data Type “ DTD ” , the file path is <opnet_dir> /
each SPOKE on the same network, this protocol is based on <reldir> / sys / etc / network.dtd.
the client-server principle, the spokes (NHRP Clients) send
periodic NHRP updates containing public and tunnels The simulation of communication network is paramount in
addresses to the HUB (NHS) of the network, for example when the design process task, planning and optimization of
SPOKE1 wants to communicate with SPOKE2, SPOKE1 architectures. Through a simulation environment, many
consults the NHRP cache of NHS to determine public IP conditions can be studied such as scalability that is difficult to
associated with the IP tunnel of SPOKE2. A GRE interface can simulate in a real environment because of its very high cost,
maintain multiple IPsec tunnels, both to simplify configuration such as simulation of the dynamic and multipoint virtual
and save time thanks to mGRE protocol. GRE protocol private networks. Several scientific research simulators can be

1|P age
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 5, No. 12, 2014

used as OPNET Modeler, NS2…[8,9,10], but managing a A. User Policies Definition:


dynamic and multipoint VPN under OPNET Modeler This agent allows defining the attributes of security and
simulator requires firstly a mastery of the tool and secondly routing policies of the DMVPN network, through a graphical
the technology, this is a good motivation to develop a system man/machinery interaction.
for automatically creating projects for various architectures of
the same network, for this reason we have created an This agent is composed of several modules; Architecture
automation model for simulating dynamic multipoint and multi Module, Tunnel Module, Security Module, Routing Module
architectures Virtual Private Network, and a GUI and Key Module.
man/machinery application designed for this type of networks.  Architecture Module: This module defines the type of
The simulation of a large scale network such as DMVPN in architecture to handle: Single Hub Single Cloud or
a simulator such as Opnet Modeler requires a mastery of VPN Multiple Hub Multiple Cloud.
technology and the simulator, and since these VPNs can be
 Tunnel Module: This module is responsible of
composed of hundreds sometimes thousands of sites its
establishing tunnels between the Hubs and Spokes
simulation by the manual method without mistakes is a big
depending on the type of architecture described in the
challenge, various works has been done in the automation of
previous module. The identification and authentication
networks simulations for Opnet modeler [11, 12] and the
of tunnels will be made by
design of GUI-based tool for the conversion of simulation
Key Module attributes.
scenarios to the XML files meant for various simulators[13],
unfortunately according to our research no automation model  Security Module: This module defines the IPsec
of generation and simulation of such networks was proposed, protocol to use and which could be AH or ESP,
this is a good motivation to develop a new model for encryption protocols (DES, 3DES, AES) and integrity
automating simulations of DMVPN networks for Opnet protocols (MD5, SHA) for two IKE phases, by default
Modeler “DMVPN Automatic Simulation” and create a WEB- the mode used is transport to avoid a third
based tool for personalized management of projects. encapsulating of the IP header.
The rest of the paper is organized as follows, in Section 2  Key Module: This module defines the identification key
we will discuss the developed model “DMVPN Automatic of the tunnel, the DMVPN cloud ID, the authentication
Simulation” and define its various modules, in Section 3 we key for access to the DMVPN network as well as the
will describe thoroughly various steps required by the model to IPsec password.
automatically generate projects, Section 4 will be reserved for a
sample demonstration of an automatic generation of project  Routing Module: This module allows the generation of
using the application implemented, and we will conclude in a more suitable configuration of routing protocol for a
section 5. specific DMVPN architecture, the proposed model
supports; Routing Information Version 2 (RIPv2),
II. DMVPN AUTOMATIC SIMULATION MODEL Enhanced Interior Gateway Routing Protocol (EIGRP),
DMVPN Automatic Simulation model [Fig. 1] allows Open Shortest Path First (OSPF) and Interior Border
policy-based simulation automation for DMVPN network, Gateway Protocol (iBGP).
multi-architectures, for Opnet model using a web graphical B. Treatment and generation:
interface, the model is composed of two main agents “User
This agent describes the processing that occurs on the
Policies Definition” and “Treatment and generation”;
server side, converting user data into a project already
configured ready to be simulated in Opnet Modeler, this agent
is composed of three modules:
 Device personalization module: This module allows the
generation of nodes (routers and IPV4 clouds) with a
customized number of interfaces according to the user-
specified architecture.
 XML to map OPNET Module: This module check the
attributes of the file network.dtd to prepare a
customized XML file with user specified data, XML
attributes may differ from architecture to another,
equipment generated by the previous module will be
defined in the XML file.
 Project generation module: This module allows the
generation of XML file prepared by the previous
module and run the simulation in Opnet Modeler.

Fig. 1. Architecture of DMVPN Automatic Simulation

2|P age
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 5, No. 12, 2014

III. FUNCTIONING OF THE DMVPN AUTOMATIC the model DMVPN Automatic Simulation to automatically
SIMULATION MODEL generate projects, [Fig. 2] shows the operation of the model.
In this section we will describe various steps required by

Fig. 2. Flow chart illustrate the operation of DMVPN Automatic Simulation

1) The user must choose the architecture to deploy; Single 5) If the user chooses Multiple Hub Multiple Cloud, a
Hub Single Cloud or Multiple Hub Multiple Cloud; specification of number of Hubs and Spokes to deploy is
2) If the user chooses to simulate Single Hub Single Cloud necessary;
architecture, a specification of number of Spokes to deploy is 6) The user must specify for each device its Public IP
necessary, according to the specified number by the user a address, private IP address, the name of the public interface
graphical user interface will be generated automatically and the priority of each HUB, if routers have the same priority,
composed of n + 1 rows, where n is the number of Spokes and load balancing with equal cost will be made between HUBs, if
1 is the HUB line; not the router with the highest priority will be the primary
3) The user must specify for each device its Public IP router, the other will be considered secondary;
addresses, private IP address and the name of the public 7) The user defines graphically the security settings of
interface; IPsec IKE Phase 1 and 2, specifies NHRP password, NHRP +
4) The user defines graphically the security settings of IKE mGRE keys and finally chooses the routing protocol (RIPv2,
Phase 1 and 2, specifies the NHRP password, NHRP + mGRE EIGRP, OSPF, iBGP)
keys and finally chooses the routing protocol (RIPv2, EIGRP, 8) The nodes are created with a customized number of
OSPF, iBGP); interfaces according to user-specified architecture.

3|P age
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 5, No. 12, 2014

9) XML attributes to be used for a specific version Opnet Step 4: DMVPN Automatic Simulation Tool convert
model are prepared according to DTD file of current version automatically user parameters into XML configuration file
of Opnet Modeler installed; ready to be simulated under OPNET Modeler.
10) The final generated XML file containing the position of The following demonstration will be for the simulation of
each node and its associated DMVPN network, Single Hub Single Cloud architecture,
configuration ready to be simulated in Opnet Modeler. composed of two Spokes.
IV. DEMONSTRATION AND GUIDED VISIT Step 1- Specify the architecture to simulate:
In order to validate the Designed model, an implementation The user through the menu [Fig. 4] can choose to deploy a
is required, the tool created is based on a guided web graphical Single Hub Single Cloud architecture (1) Multiple Hub
interface extremely easy to manipulate, any web browser and Multiple Cloud (2)
operating system can be used.
Developed tool (DMVPN Automatic Simulation Tool) has
two mains purposes. First purpose is to provide a user-friendly
entering and editing of parameters of DMVPN network.
Second purpose is to automatically map user parameters into
OPNET Modeler project and create custom nodes.

Fig. 4. Main Menu

Fig. 3. Use Case Diagram of proposed tool

The modeling procedure [Fig. 3] consists of four steps:


Step 1: User must choose the architecture to deploy;
Step 2: User must indicate for each specific device its
identity information (public, private and tunnel IP addresses,
outside interface and private address mask); Fig. 5. Specifying the number of Spokes to deploy
Step 3: User must indicate Security policy (IPsec attribute,
A window appears [Fig. 5], prompting the user to specify
NHRP password and mGRE and NHRP Keys) and routing
the number of Spokes to deploy (4).
protocol (RIPv2, EIGRP, OSPF, iBGP) to be applied for all
equipment on the same architecture; Step 2 : Define identity information:

4|P age
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 5, No. 12, 2014

Fig. 6. Specifying equipments data

After specifying the number of Spokes to install, a window sections are composed of the following fields: public IP
[Fig. 6] is displayed, the window is mainly composed of two address (9) outside interface (10), private IP address (11),
parts: identity configuration (5) security and routing policies subnet mask of private address (12), option (13) to reset all
configuration (6). The flap (5) consists of two sections: HUB fields the current window.
Configuration (7) and Spokes Configuration (8), the two
Step 3 : Define security policy and routing protocol:

Fig. 7. Configuration of security and routing policies

The second section, security and routing policies Section (15) is composed of three fields, the choice of
configuration [Fig. 7] consists of four main sections: IPsec encryption protocol (19), the integrity protocol (20) and the
phase 1 configuration (15), IPsec phase 2 configuration (16), password key derivation (21).
protection of the tunnel (17) and the choice of routing protocol
(18).

5|P age
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 5, No. 12, 2014

Section (16) is composed of three fields, the protocol IPsec Step 4 : Import generated XML File to OPNET Modeler:
to use ESP or AHP (22), encryption protocols and integrity
respectively (23) and (24); the default mode is set to Transport. After completing the customization of the architecture,
submit button (29) send user parameters to remote server in
Section (17) is composed of three fields, NHRP password order to generate custom nodes and an XML file containing the
of current network (25), mGRE tunnel key (26) used to configuration of the project ready to be simulated in Opnet
separate tunnels and provide authentication and the identifier of Modeler [Fig.8].
the NHRP network (27).
Final step consist of importing generated XML file to
The last section (18) allows the user to pick through a list Opnet Modeler, [Fig. 9] illustrate the resulting topology.
the protocol to be implemented which can be one of these
protocols RIPv2, EIGRP, OSPF or iBGP (28).

Fig. 8. resulting XML file

V. CONCLUSION
Manual stimulation of a Dynamic Multipoint multi-
architecture VPN network, in Opnet Modeler is a time-
consuming task, which also requires expertise in technology to
simulate and the simulator as well as the margin of error is not
null. The model proposed and the tool designed allows
automating the generation of dynamic scenarios VPN
multipoint multi- architectures projects for Opnet modeler
based on a WEB-Based interface easy to manipulate.
The model was implemented and tested on Single Hub
Single Cloud architecture consisting of ten Spokes, the time
required for an expert on VPN networks and Opnet Modeler
for manual set up of this architecture is 40 minutes, we
moved that to 3 minutes with the proposed model, in addition
to time effectiveness the margin error is null.
The independence of the modules of the model proposed
will allow in future work to adapt it with other simulators such
as NS3 simulator.

Fig. 9. Designed and configured Architecture

6|P age
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 5, No. 12, 2014

REFERENCES [7] Ibrahim, Q., & Khudher, I. A. (2011). Network Simulation Guide:
[1] Asati, R., Khalid, M., Retana, A. E., Van Savage, D., & Sethi, P. P. Lecture Notes and Lab Manual.
(2013). U.S. Patent No. 8,346,961. Washington, DC: U.S. Patent and [8] Altman, E., & Jimenez, T. (2012). NS Simulator for beginners.
Trademark Office. Synthesis Lectures on Communication Networks, 5(1), 1-184.
[2] Chen, H. (2011, May). Design and implementation of secure enterprise [9] Siraj, S., Gupta, A., & Badgujar, R. (2012). Network simulation tools
network based on DMVPN. In Business Management and Electronic survey.International Journal of Advanced Research in Computer and
Information (BMEI), 2011 International Conference on (Vol. 1, pp. 506- Communication Engineering, 1(4), 199-206.
511). IEEE. [10] Borboruah, G., & Nandi, G. (2014) A Study on Large Scale Network
[3] Jankuniene, R., & Jankunaite, I. (2009, June). Route creation influence Simulators5. International Journal of Computer Science and Information
on DMVPN QoS. In Information Technology Interfaces, 2009. ITI'09. Technologies, Vol. 5 (6) , 7318-7322.
Proceedings of the ITI 2009 31st International Conference on (pp. 609- [11] Mohorko, J., Klampfer, S., Fras, M., & Cucej, Z. Expert System for
614). IEEE. Automatic Analysis of Results of Network Simulation.
[4] Thorenoor, S. G. (2010, April). Dynamic routing protocol [12] Li, H., & Lin, X. (2005, October). An OPNET-based 3-tier network
implementation decision between EIGRP, OSPF and RIP based on simulation architecture. In Communications and Information
technical background using OPNET modeler. In Computer and Network Technology, 2005. ISCIT 2005. IEEE International Symposium on (Vol.
Technology (ICCNT), 2010 Second International Conference on (pp. 2, pp. 793-796). IEEE.
191-195). IEEE.
[13] Canonico, R., Emma, D., & Ventre, G. (2003, October). An XML
[5] Adoba, B., & Dixon, W. (2004). RFC 3715–IPSec-network address description language for web-based network simulation. In Distributed
translation (NAT) compatibility requirements. Simulation and Real-Time Applications, 2003. Proceedings. Seventh
[6] Lu, Zheng, and Hongji Yang. Unlocking the power of OPNET modeler. IEEE International Symposium on (pp. 76-81).IEEE.
Cambridge University Press, 2012.

7|P age
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 5, No. 12, 2014

Development of Social Recommendation GIS for


Tourist Spots
Tsukasa IKEDA Kayoko YAMAMOTO
Graduate School Student, Associate Professor,
Graduate School of Information Systems, Graduate School of Information Systems,
University of Electro-Communications University of Electro-Communications
Tokyo, Japan Tokyo, Japan

Abstract—This study aims to develop a social system to easily transmit, receive, and share information, and
recommendation media GIS (Geographic Information Systems) through the effective use of information systems, information
specially tailored to recommend tourist spots. The conclusions of possessed by ordinary people can be collected and
this study are summarized in the following three points. (1) accumulated. Of the information possessed by people involved
Social media GIS, an information system which integrates Web- in tourist spots as supporters of tourist services, local residents,
GIS, SNS and recommendation system into a single system, was and people who have visited the tourist spots as tourists,
conducted in the central part of Yokohama City in Kanagawa “experience-based knowledge” is the part that exists as “tacit
Prefecture, Japan. The social media GIS uses a design which knowledge” that is not visualized if it is not communicated to
displays its usefulness in reducing the constraints of information
others. Therefore, by using an information system to change
inspection, time and space, and continuity, making it possible to
redesign systems in accordance with target cases. (2) The social
this “experience-based knowledge” into “explicit knowledge”
media GIS was operated for two months for members of the which can be accumulated, organized, utilized, and made
general public who are more than 18 years old. The total public, collecting the knowledge, and having users share the
numbers of users was 98, and the number of pieces of knowledge with each other, it will become possible for users to
information submitted was 232. (3) The web questionnaires of efficiently obtain necessary information and to go on fulfilling
users showed the usefulness of the integration of Web-GIS, SNS tourist excursions.
and recommendation systems, because the functions of reference Based on the above-mentioned background, the aim of the
and recommendation can be expected to support tourists’
present study is to uniquely develop a social recommendation
excursion behavior. Since the access survey of log data showed
that about 35% of accesses were from mobile information
GIS (Geographic Information Systems) which integrates a
terminals, it can be said that the preparation of an optimal Web-GIS, an SNS, and a recommendation system, and is
interface for such terminals was effective. designed for recommending tourist spots, in order to support
users’ efficient acquisition of information about tourist spots in
Keywords—Social Recommendation GIS; Web-GIS; Social urban tourist areas, about which a variety of information is
Media; SNS; Recommendation Systems transmitted, by enabling information to be accumulated,
shared, and recommended.
I. INTRODUCTION
Further, the system that is developed is also operated and
In recent years the transformation of Japan to an evaluated, and measures to improve the system are identified.
information-intensive society is progressing, and a variety of The aim is for the social recommendation GIS of the present
information is being transmitted using the internet. Similarly, a study to transform information about tourist spots which is tacit
variety of information is also being transmitted using the knowledge into explicit knowledge and to accumulate and
internet in the field of tourism, and the internet has become a share the information so that the appeal of the tourist spots is
primary information source for planning tourist trips and communicated. In addition, the aim is for the social
searching for information about the area of a destination. recommendation GIS to support users’ efficient acquisition of
However, due to the large amount of information, and the information about tourist spots by guiding users to appropriate
variety of types of information, it is difficult for users to information from among the enormous amount of varied
appropriately select and acquire necessary information by information available.
themselves. In particular, the amount of information submitted
and made public about tourist spots in urban areas is very large II. RELAED WORK
compared to that for regional tourist spots, and it is difficult for The present study is related to three fields of research: (1)
people who do not have much knowledge of or acquaintance Research concerning tourism support systems and methods; (2)
with the places concerned to efficiently obtain information Research concerning systems and methods that recommend
necessary for taking tourist trips. Therefore, a recommendation places such as tourist spots; and (3) Research concerning the
system for guiding users to appropriate information is development of social media GIS. Following are examples of
necessary. previous studies in these related fields which focused on tourist
Meanwhile, Japanese society has become such that information and regional information. In (1) Research
nowadays, anyone, anywhere, anytime can use an information concerning tourism support systems and methods, Ishizuka et

8|P age
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 5, No. 12, 2014

al. (2007) [1] proposed a method of searching for similarities in this respect, a synergistic effect of integrating the three
in data on movement paths of tourists based on location above-mentioned applications is obtained. This demonstrates
information and text information related to the location the usefulness of the system. Further, the present study also
information. Kurata (2012) [2] developed a sightseeing route focuses on information exchange between users - something
automatic generation system which utilized a Web-GIS and a that until now has not been taken into account very much in
genetic algorithm. Kawamura (2012) [3] proposed using a studies involving just a recommendation system - and includes
standard tag related to tourism in an SNS, set up a website, and a recommendation system in an SNS. Through this, both
organized tourist information about Hokkaido on the internet. effective recommendation of tourist spots to each user and
information exchange between users which utilizes SNS
In (2) Research concerning systems and methods that communication functions are enabled in the one system, and
recommend places such as tourist spots, Kurashima et al. this is another reason the system is useful.
(2011) [4] proposed a method for recommending travel routes
that utilizes geotags in a photo-sharing website, and Van III. RESEARCH OUTLINE AND METHOD
Canneyt et al. (2011) [5] proposed a system for recommending
tourist attractions. Batet et al. (2012) [6] proposed a system for In the present study, research is conducted according to the
recommending tourist spots using a multi-agent system, and following outline and method. Firstly, a social recommendation
Uehara et al. (2012) [7] proposed a system which recommends GIS which specializes in the aim of the present study is
tourist spots by extracting tourist information from the Web uniquely designed (Section IV) and developed (Section V).
and calculating similarities between tourist spots based on Next, anticipating users are the general public who are more
multiple feature vectors. Further, among research concerning than 18 years, an operation test and operation of the social
LBSN (location-based social networks), research concerning recommendation GIS (Section VI) are conducted. Further, the
recommendation of points of interest (POIs) also belongs to the system is evaluated and measures for improving use of the
field of research about systems and methods that recommend system are identified (Section VII). Anticipating that each user
places such as tourist spots. Representative examples of will use the system for about a month, an operation test and an
research concerning recommendation of POIs are the study by evaluation of operation test are conducted, and then actual
Yu and Chang (2009) [8] in which they proposed a POI operation is conducted. In addition, web questionnaires are
recommendation system which supports trip planning, the given to users, access is analyzed using log data during the
study by Noguera et al. (2012) [9] in which they proposed a period of actual operation, and submitted information is
POI recommender system based on location information about analyzed. Based on the results of these steps, the system is
present location, and the study by Baltrunas et al. (2011) [10] evaluated, and measures to improve the system in order to
in which they proposed a POI recommender system based on more effectively support people taking tourist trips are
location information and user preferences. Ye et al. (2011) [11] identified. The central part of Yokohama City in Kanagawa
and Ying et al. (2012) [12] proposed POI recommendation Prefecture was selected as the region for operation. One reason
methods based on location information, user preferences, and is that this area is a popular urban tourist area, so many tourists
social networks. Similarly, Bao et al. (2013) [13] proposed a visit it; therefore, a lot of information about the area is
recommender system based on such things. Yuan et al. (2013) submitted by people and published, with the result that it is
[14] proposed a POI recommendation method which took difficult for tourists to efficiently obtain necessary information
spatio-temporal information into account, and Liu et al. (2013) about the area. A further reason is that since this area has many
[15] proposed a POI recommendation method which took kinds of tourist spots, the system of the present study can be
changes in user preferences into account. used to recommend tourist spots that are suited to the
preferences of various users.
In (3) Research concerning the development of social
media GIS, using a Web-GIS, an SNS, and a wiki, Yanagisawa IV. RESEARCH OUTLINE AND METHOD
and Yamamoto (2011) [16] developed a system for A. System Features
accumulating local knowledge in local communities, and
Nakahara et al. (2012) [17] developed a system for supporting As shown in Fig. 1, the system proposed by the present
communication concerning local knowledge in local study is formed by an integration of three applications – a
communities. Further, using a Web-GIS, an SNS, and Twitter, Web-GIS, an SNS, and a recommendation system. The
Yamada and Yamamoto (2013) [18] developed a system for primary reason for integrating these three applications is that if
information exchange between regions, and Okuma and only a Web-GIS is used, a system is limited to unilateral
Yamamoto (2013) [19] developed a system for accumulating transmission of information using a digital map; therefore, an
urban disaster information. SNS was integrated with a Web-GIS to allow interactive
transmission and reception of information. The second reason
However, in the previous research mentioned above, there is that, as will be described in detail in Section IV.B.2), in the
is no system that integrates a Web-GIS, an SNS, and a present study, Environmental Systems Research Institute,
recommendation system. In the present study, we develop a Inc.’s (ESRI’s) ArcGIS Server is used as the Web-GIS;
system which integrates a Web-GIS, an SNS, and a however, a recommendation system cannot be directly
recommendation system, and this makes the system unique. included in the ArcGIS Server. Therefore, the Web-GIS and
Further, support for efficient acquisition of information about the recommendation system were included in the SNS to
tourist spots which takes into account the preferences of each enable the three applications to be integrated together.
user is enabled by making accumulation, sharing, and Accordingly, integrating the three applications - that is, the
recommendation of information possible in the one system, and Web-GIS, the SNS, and the recommendation system – enables

9|P age
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 5, No. 12, 2014

the benefits outlined below to be realized in the one system, 2) Easing of time and spatial constraints
and therefore, a synergistic effect of integrating the three As a situation in which time and spatial constraints might
applications can be obtained. arise, a situation in which connection to the internet is not
Specifically, management and visualization of submitted possible can be imagined. In order to ease such constraints, in
information on the digital map of the Web-GIS, limitation of addition to a PC interface, an interface for mobile information
users by the uniquely developed SNS, and information sharing terminals is also provided, so the system can be used anytime
and exchange between a limited group of users are enabled. regardless of whether a user is indoors or outdoors. Thanks to
Further, users can submit, view, and evaluate information this, even when users are in the middle of a tourist trip, they
while gaining a grasp of geographic information related to can use the functions for viewing and recommending
tourist spot information on the digital map. Moreover, thanks information about tourist spots independent of time and spatial
to the inclusion of the recommendation system, information constraints in order to obtain support for efficiently acquiring
suited to the preferences of each user can be given more information about tourist spots.
priority, when information that has been accumulated and 3) Easing of constraints concerning continuous operation
shared is provided to users using the digital map. Therefore, of the system
even when the system is operated in the long term and an In order to maintain an environment in which tourist
enormous amount of information has been accumulated, each information can be submitted, view, and recommended without
user can be introduced to appropriate information, and it can be constraints on time and place, thanks to the features outlined
anticipated that the system will support efficient acquisition of above in 1) and 2), a system design which enables management
information about tourist spots. Accordingly, the system’s of information submission is necessary. Further, in the case
usefulness, mentioned in Section II, can be described in detail where the system is designed as one in which anyone can
in the following manner. participate, if there is no system that allows the management of
1) Easing of constraints concerning information submitted information, there is a risk that system operation
inspection which conforms with the aims of the system may be difficult
As a situation in which information inspection might be when inappropriate information is submitted. However, in the
restricted, a situation in which a variety of information is system of the present study, users with malicious intent can be
identified, because submitted information is managed in a
submitted and transmitted, the amount of information becomes
excessive, and users have difficulty efficiently selecting and centralized manner using a database and accounts are managed
obtaining the necessary information can be imagined. using the SNS; therefore, long-term operation of the system is
Therefore, in order to ease constraints on inspecting possible.
information, a recommendation system is included in the Further, the information terminals focused on for use with
system of the present study. This allows the system to the system of the present study are PCs and mobile information
appropriately guide each user to information about tourist spots terminals. In the latter category, smartphones and tablet-type
that is suited to their preferences from among a large amount of terminals (which have rapidly come into wide use in recent
information in a short time. years) are focused on for use with the system. Both these types
of mobile information terminals have touch panels with large
screens, which means they are easy to use when dealing with
digital maps, and both types allow connection to the internet
from anywhere via cellular phone data communication
networks. Therefore, they can achieve easing of the time and
spatial constraints mentioned above.
B. System design
1) System configuration
The social media GIS of the present study is formed using
three servers - a web server, a database server, and a GIS
server. The web server mainly performs processing related to
the SNS, and accesses the GIS server and the database server
to integrate each of the functions. The SNS is implemented
using JavaScript and PHP, and the recommendation system is
implemented using PHP. The database server is managed using
MySQL, and accumulates submitted information collected
through the SNS. For the web server and the database server,
the rental server of the information technology center of the
organization to which the authors belong was used. For the GIS
server, as the OS, Microsoft Corporation’s Windows Server
2008 was used, and as the GIS server software, ESRI’s ArcGIS
Fig. 1. System design of social recommendation GIS Server 10.0 was used.

10 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 5, No. 12, 2014

2) Web-GIS knowledge-based recommendation can solve the cold start


In the present study, for the Web-GIS, ESRI’s ArcGIS problem. The cold start problem is that it is difficult to make
Server 10.0 was used, and for the GIS base map of the Web- appropriate recommendations for users new to using the
GIS, the SHAPE version (Rel.8) of Shobunsha Publications, system, and difficult to recommend items which have been
Inc.’s MAPPLE10000, which is part of their MAPPLE digital newly introduced to the system as items for recommendation.
map data and includes detailed road system data, was used. As Concerning the problem of difficulty in making appropriate
the map that was superimposed with this digital map data, the recommendations to new users, Kamishima (2008) [21]
user interface of Google Maps was used. Among the options pointed out that if knowledge-based recommendation is
provided by ESRI that are ArcGIS Server 10.0 API targets, the employed, users directly write their own user profiles
Google Maps user interface is the one that has been used the themselves, so the problem does not arise. Further, concerning
most in previous studies in fields related to the present study. the difficulty of recommending newly introduced items,
Concerning the superimposition of MAPPLE10000 (SHAPE Kamishima (2008) [21] pointed out that when content-based
version) and Google Maps, Google Maps employs the new recommendation or knowledge-based recommendation is
geodetic system coordinates, while MAPPLE10000 conforms employed, if there are user profiles, even new items can be
to the former geodetic system coordinates; therefore, recommended without a problem, using their feature vectors as
ArcTKY2JGD, which is provided by ESRI as product support, hints.
was used to convert the MAPPLE10000 geodetic system Further, this system is for use with ordinary people.
coordinates to the new coordinates. Furthermore, editing was Therefore, in the creation of user profiles based on user
performed such that information about the region for operation preference information, it is desirable to use a question format
could be input using ArcMap 10.0. which is clear and intuitively easy to understand. Therefore, the
3) SNS question items are asked using values in five stages of from 1
In the present study, an SNS was selected as the social to 5, and user profile vectors are created. Similarly, regarding
media for integration with the Web-GIS and the tourist spot evaluation information, a submitter of new tourist
recommendation system. The SNS was uniquely designed and spot information is asked questions about each evaluation item
developed to suit the objectives of the system. An SNS was using values in five stages of from 1 to 5, and tourist spot
chosen because in contrast to other forms of social media, if an feature vectors are created. Thus, for the purpose of dealing
SNS is used the system can be uniquely designed and with the above-mentioned cold start problem as well, in this
developed in a way that best suits the objectives of use, and system a setting is used that means users must input evaluation
detailed system configuration can be performed in a unique information when they submit information about a tourist spot.
manner to suit regional characteristics of the region in which Further, when appealing to people to use the system, this was
the system is to be operated. Further, as mentioned in Section explained in the operating instructions that were distributed,
IV.A, developing our own unique SNS enabled the information and people were asked to input evaluation information.
transmission of the system to be interactive, and enabled the Based on the user profiles of users and tourist spot feature
recommendation system to also be integrated into the system. vectors that have been created, degree of similarity is
Firstly, features such as those relating to registering and calculated using Equation (1), and tourist spots with a high
publishing user information and to submitting, viewing, and degree of similarity are recommended.
recommending information were uniquely designed to suit to ∑
the objectives of the present study. Next, since in this system it
is desirable that users voluntarily communicate with each √∑ √∑
other, in contrast to ordinary SNSs, friend registration and : Degree of similarity
community functions were not designed, and as methods for
: User preference information
communication, comment function and button functions were
designed. The comment function is used for communication : Tourist spot evaluation information
between users and for providing additions related to submitted : Question item number
information. Concerning the button functions, two buttons : Tourist spot number
were designed – one for “I want to go there” and one for “I
5) Management of submitted information
didn’t know that”. These buttons are used for simple
As mentioned in Section IV.A regarding the easing of
communication and for evaluating submitted information. Of
constraints concerning continuous operation of the system, a
the button functions, a ranking function was also added to the
system design which enables management of submitted
“I want to go there” button function. In order to show clearly
information is necessary. Therefore, this system aims for long-
what kinds of submissions are attracting attention from users,
term operation, and is designed such that no restrictions are
submissions are shown in order of popularity on the submitted
imposed when submitted information is made public to all
information ranking page of the user screen.
users, but in the case where an administrator determines that a
4) Recommendation system posting has been made by a user with malicious intent or
There are three methods for recommendation systems – determines that submitted information does not suit the aims of
collaborative recommendation, content-based the system, the administrator can exercise rights to delete
recommendation, and knowledge-based recommendation accounts and delete posts. Specifically, the system is provided
(Jannach et al., 2012) [20], and in this system, knowledge- with a function that enables centralized management of
based recommendation is used. A reason for this is that submitted information through a database.

11 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 5, No. 12, 2014

V. SYSTEM DEVELOPMENT recommendations about tourist spots. From user preference


information saved in the database, user profiles are created, and
A. System front end from information about evaluation of tourist spots saved in the
In the present study, as is described in detail below, unique database, tourist spot feature vectors are created. Further, using
functions for users are implemented, and tourist information is Equation (1) shown in Section IV.B.4), the degree of similarity
accumulated, shared, and recommended. of user profiles with each tourist spot is calculated. Then, up to
a maximum of ten tourist spots are displayed as
1) Information submission function recommendation results for the user in descending order of
When users wish to submit information about a tourist spot, similarity.
they can click “Submit tourist information” on the initial page
to move to the submission page. Items the users submit are 2) System for management of submitted information by
title, tourist spot category, main text of the submission, images, administrators
information on evaluation of the tourist spot, and location Every user’s submissions of information and image files
information. After writing content or making selections for are all accumulated as data in the database of the system.
items other than location information, when users click the Administrators manage users and check submitted information
location related to the submitted information on the digital using a list screen designed especially for the purpose.
map, the location information will be input into MySQL. When Administrators can take measures such as suspending accounts
the user sends their submission, the submission is complete. of users who have made inappropriate transmissions or
behaved inappropriately, and if by any chance an inappropriate
2) Information viewing function
submission is made, administrators can delete the submission
When users wish to read information about tourist spots,
with just one click. Thanks to these features, there is no need
they can click “View” on the initial page to move to the
for administrators to check whether or not inappropriate
viewing page. On the viewing page check boxes have been
submissions of information have been made within the system;
created for each category, so by selecting the category they
therefore, their burdened can be lessened.
wish to view, users can display a marker. When a user clicks
the marker, a bubble with a link to detailed information about C. System interfaces
the tourist spot will be displayed. Further, the user can click the This system has three kinds of interface – a user PC
link in the bubble to move to a page where they can view the interface (Fig. 2), a mobile information terminal interface
detailed information about the tourist spot and can use the especially optimized for smartphones and tablet-type terminals
comment function and button functions. Users can use the (Fig. 3), and an administrator PC interface.
comment function to communicate with other users and to
supplement information that has been submitted. Further, VI. OPERATION TEST AND OPERATION
concerning the button functions, the two buttons “I want to go
there” and “I didn’t know that” can be used for simple In accordance with the operation process in TABLE I,
communication and to evaluate submitted information. The “I actual operation of the social recommendation GIS designed
want to go there” button function includes a ranking function, and developed in the present study was carried out after an
so submitted information can be displayed in order of operation test and an evaluation of the operation test had been
popularity on the submitted information ranking page on the conducted.
user screen. A. Comparison with existing services in region of operation
3) Tourist spot recommendation function Yokohama City, the region for operation in the present
When users use the tourist spot recommendation function, study, is a popular urban tourist area, so a large amount of
they click “Tourist spot recommendation” on the initial page to information about it is transmitted by various tourist
move to the tourist spot recommendation page. When a user information services. In order to verify the usefulness of the
selects the tourist spot category in which they are seeking system within the region for operation, results of a comparison
recommendations on the recommendation page, up to a of features with existing services were summarized as shown in
maximum of ten tourist spots that match the user’s preferences TABLE II. Examples of existing services that target Yokohama
will be displayed as recommendation results, together with City are the Yokohama City official tourist information
check boxes. When the user clicks tourist spot check boxes, website (1) and the website Hamatch! SNS (2). The Yokohama
markers of those tourist spots are displayed on the digital map City official tourist information website transmits various
of the Web-GIS. Further, as with the viewing function, when information concerning tourism and introduces recommended
the user clicks a marker, a bubble with a link to detailed tourist routes; however, users cannot make submissions to the
information about that tourist spot is displayed, and the display site. The website Hamatch! SNS mainly accumulates and
changes to a page showing detailed information about the shares information, and the main purpose of these activities is
tourist spot which the user can view. to support various regional and civic activities. The only
support it provides for people taking tourist trips is to display
B. System back end shops and places recommended by local residents by word-of-
1) Processing related to recommendation system mouth on a digital map. Further, examples of services targeting
In this system, processing for calculation of degree of the whole of Japan are the websites TripAdvisor (3), MAPPLE
similarity in the recommendation system is performed by the Tourist Information (4), and Jalan Tourist Information (5), the
back end, so simply by also registering their preference travel word-of-mouth website 4travel.jp(6), the website
information as user information, users can receive Foursquare(7), and “Facebook Places” (8). These services are

12 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 5, No. 12, 2014

systems which also allow users to make word-of-mouth described in Section IV.B.2), MAPPLE10000 (which includes
submissions about tourist spots. However, their support for detailed road system data) was used as the GIS base map, the
tourist trips is limited to displaying tourist spot information on GIS overlay function was used, and MAPPLE10000 was used
digital maps, introducing tourist spot information, and in superimposition with the user interface of Google Maps. As
recommending facilities and spots near places chosen by users. a result, reference can be made to a detailed road system on the
They do not take the preferences of each user into account digital map – a road system which also includes narrow streets
when recommending tourist spots. output from MAPPLE10000, and it is possible to precisely
display tourist spot information and accurately check places
Further, as shown in TABLE II, there are existing services related to submitted information. Further, similarly, functions
which employ digital maps; however, the services other than for coordinate conversion and editing were used, coordinates of
the system of the present study do not use a Web-GIS; MAPPLE10000 were converted to suit Google Maps, and
therefore, they are limited to only displaying submitted editing was performed such that information about the region
information on one digital map. They cannot employ the for operation could be input. Therefore, compared with
primary functions of a GIS which are for such purposes as existing services related to the region of operation, the
digital map editing and superimposition, and information usefulness of the system of the present study lies in the fact that
analysis. Nor are they capable of information accumulation, in order to support efficient acquisition of information about
update, addition, correction, inspection, and so on in a digital tourist spots, the system allows Web-GIS digital map-based
map. In contrast, the map screen of the GIS has a layered information accumulation and sharing between users, Web-
structure (hierarchical structure); therefore, the overlay GIS digital map-based information submission and viewing,
function can be used to superimpose multiple digital maps communication between users via use of the comment function
which each have different information on the base map. This and button functions, and recommendation of tourist spots
function is used in the system of the present study. As suited to the preferences of each user.

Fig. 2. PC interface and description of functions

13 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 5, No. 12, 2014

Fig. 3. Mobile information terminal interface and description of functions

B. Tourist spot data Before posting the information, the present authors checked its
For the present system, with reference to Yamada and validity.
Yamamoto (2013) [18] and the websites 4travel.jp and C. Anticipated users
MAPPLE Tourist Information, tourist spots were divided into
eight categories, and there were eleven evaluation items Two types of users were anticipated for the system,
(Satisfaction; Access; Crowding; Scale; Worth seeing; Barrier- according to whether or not users had knowledge of the region
free; Atmosphere; Quality of attractions; Comfortableness of for operation. It was anticipated that users with knowledge
facilities; Exhibition contents; and Cost) for each tourist spot, about the region of operation would mainly submit information
to be evaluated using five ranks. Further, in order to deal with using the submission function, supplement submitted
information using the comment function, and evaluate
the cold start problem mentioned in Section IV.B.4) and allow
the tourist spot recommendation function to be used right from submitted information using the button functions, thus using
the beginning of operation, it is necessary to collect and the system as a tool for accumulating, sharing, evaluating, and
accumulate information about the main tourist spots in the recommending information. Further, it was anticipated that
region for operation. Therefore, forty-four tourist spots in the users without knowledge about the region of operation would
region for operation that are posted in the aforementioned mainly view submitted information using the viewing function,
website “4travel.jp” were selected, and only information about evaluate submitted information using the button functions, and
evaluation of the above-mentioned eleven tourist spot obtain tourist spot recommendations using the recommendation
evaluation items for these forty-four spots was posted into the function, thus using the system as a tool for viewing,
system of the present study immediately after start of operation. evaluating, and recommending information.

14 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 5, No. 12, 2014

TABLE I. OPERATION PROCESS OF THE SYSTEM

Process Aim Period Specific details


1. Survey of To understand efforts related to tourism in the December 2012 - March - Survey of government measures and internet
present region for operation (Yokohama City) 2013 services
conditions - Interviews with government departments
responsible, tourist associations, etc.
2. System Configure the system in detail to suit the region for April - June 2013 - Define system requirements
configuration operation - System configuration
- Create operation system

3. Operation test Conduct the system operation test July 2013 - Create and distribute pamphlets and operating
instructions
- System operation test
4. Evaluation of Reconfigure the system based on results of August - September 2013 - Evaluation using interviews
operation test interviews with operation test participants - System reconfiguration
- Amendment of pamphlets and operating
instructions
5. Operation Carry out actual operation of the system October - November - Appeal for use of the system
2013 - Distribution of pamphlets and operating
instructions
- System operation management

6. Evaluation Evaluate the system based on the results of November - December - Evaluation using Web questionnaires, access
questionnaires, the results of access analysis which 2013 analysis which used log data, and analysis of
used log data during the period of actual operation, submitted information
and the results of analysis of submitted information - Identification of measures for using the system
even more effectively

TABLE II. COMPARISON OF FEATURES WITH EXISTING SERVICES RELATED TO REGION FOR OPERATION

Uses a
Involves user
Aim Support for people taking tourist trips Web-
submissions
GIS
Yokohama City official tourist
Transmitting information Introduces recommended tourist routes No No
information site
Accumulating and sharing Displays recommended shops and places on a Yes (Word-of-mouth
Hamatch! SNS No
information digital map submissions)
Transmitting and sharing Displays information about popular tourist spots Yes (Word-of-mouth
TripAdvisor No
information on a digital map submissions)
Transmitting and sharing Displays information about popular tourist spots Yes (Word-of-mouth
MAPPLE Tourist Information No
information on a digital map submissions)
Transmitting and sharing Displays information about popular tourist spots Yes (Word-of-mouth
Jalan Tourist Information No
information on a digital map submissions)
Travel word-of-mouth site Transmitting and sharing Yes (Word-of-mouth
Introduces information about tourist spots No
4travel.jp information submissions)
Recommends facilities in categories such as
Accumulating, sharing, and “Food” and “Sights” that are near a place chosen Yes (Word-of-mouth
Foursquare No
recommending information by the user, and displays location of the facilities submissions)
on a digital map
Recommends spots near a place chosen by the
Accumulating, sharing, and Yes (Word-of-mouth
“Facebook Places” user, and displays location of the spots on a No
recommending information submissions)
digital map
Recommends tourist spots based on information
Yes (Submissions of new
Accumulating, sharing, and about the preferences of each user, and displays
The system of the present study Yes information about tourist
recommending information information about tourist spots on a digital map
spots)
of a Web-GIS

D. Operation test and evaluation of operation test about present location to be obtained and displayed. The other
Before actual operation, six students in their twenties were was to enable selection of tourist spot categories. The system
selected as operation test participants and a two-week was reconfigured in regard to these two things only.
operation test was conducted. Based on the results of interview E. Operation
surveys with the operation test participants, two things for
improvement were identified. One was to use a geolocation Use of the system was appealed for without regard to
API for the recommendation page and enable information whether the appeal was aimed at people inside or outside the

15 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 5, No. 12, 2014

region for operation, using such means as the website of the VII. EVALUATION
present authors’ laboratory. Further, cooperation from In this section, in accordance with the operation process in
Kanagawa Prefecture and Yokohama City tourism-related TABLE I, firstly, based on the results of the questionnaire (for
departments, the Yokohama Convention & Visitors Bureau which an outline of users and respondents is shown in TABLE
(Yokohama City tourist association), and other places was III), an evaluation which concerned the use of the system was
gained in distributing system pamphlets and operating conducted. Next, based on the results of access analysis which
instructions. When users access the system for the first time, used log data during the actual operation and analysis of
they register user information such as “User name”, “E-mail submitted information, an evaluation concerning the aim of the
address”, “Age group”, “Gender”, and “Greeting” on the initial system was conducted. The aim of the system is to support the
registration screen. In order to take into account users who did efficient acquisition of information about tourist spots in urban
not want to make their user information public in detail as a tourist areas by enabling the information to be accumulated,
profile, the system was designed such that users could freely shared, and recommended. Further, based on these evaluation
choose to enter either their real name or an assumed name as results, measures for improving the system in order to support
their “User name”, and could also choose whether to make tourist trips more effectively were identified.
their “Age group” and “Gender” public. When users log in
after completing initial registration, they can perform A. Evaluation concerning use of the system
operations on the submission, viewing, and recommendation 1) Evaluation concerning ease of use of the system
screens. Further, by registering information about their
preferences in “My information”, users can receive a) Evaluation concerning information terminals
recommendations of tourist spots suited to their preferences. employed to use the system
Concerning information terminals that users employed to
TABLE III shows the details of the users during the two use the system, the percentage of users who responded “PC
month operation period. There were 49 male users and 49 only” was the highest, at about 47%. Next highest was the
female users, giving a total of 98 users. About 60% of users percentage who responded “Both PC and mobile information
were in their twenties, while about 10% were in their thirties terminal”, at about 37%, and the percentage who responded
and 10% were in their forties. Users in their twenties to forties “Mobile information terminal only” was about 16%. From this,
accounted for more than 80% of the total number of users. This it is clear that more than 50% of users employed mobile
is consistent with the fact that the majority of main users of information terminals to use the system, and this is higher than
regular SNSs are in their twenties to forties, as shown in the the percentage of users who employed only PCs to use the
2011 White Paper - Information and Communications in Japan system. It can be considered that this is a result of the provision
[22]. of an interface optimized for mobile information terminals.
Fig. 4 shows changes in the number of new users and Further, the percentage of users in their twenties who
number of new submissions in the weeks during the operation responded “Both PC and mobile information terminal” was
period. The total number of users at the end of the first month about 32%, and the percentage who responded “Mobile
of operation was more than 90% of the total number of users information terminal only” was about 13%, so it is also clear
for the whole operation period. Further, including the 44 pieces that users in their twenties used the system from a mobile
of information that the present authors submitted, as mentioned information terminal much more than did users in other age
in Section VI.B, the final total number of submissions of brackets. Therefore, it can be said that particularly for users in
information was 232. Although the number of new their twenties, the provision of a system which can be used
submissions made differed each week, more than 40 new anytime, anywhere from mobile information terminals such as
submissions were made in each of Weeks 2 and 4. After having smartphones (which have been rapidly coming into wide use in
each user use the system for about one month, a Web recent years) is very useful.
questionnaire was given to users, and use of the system was
evaluated.

TABLE III. OUTLINE OF USERS AND RESPONDENTS TO THE QUESTIONNAIRE

Aged 10 to Sixties and


Twenties Thirties Forties Fifties Total
19 above
Number of users (people) 3 61 12 12 5 5 98
Number of questionnaire respondents (people) 2 40 8 7 2 3 62
Valid response rate (%) 66.7 65.6 66.7 58.3 40.0 60.0 63.3

16 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 5, No. 12, 2014

who responded “Viewing function & Recommendation


function” was 50%, about 11% higher than that for male
respondents, and about 8% of females responded “Viewing
function & Comment function”, while no males selected this
response. These results show that there was variance between
males and females for functions which were frequently used.
TABLE V shows cross tabulated results for responses in
the “Total” section of TABLE IV and responses regarding the
usefulness of the system for taking tourist trips, which was
mentioned in Section VII.A.1).b). Among respondents who
answered that the system was “Useful” or “Somewhat useful”
when taking a tourist trip, respondents who selected the
viewing function either alone or in combination with another
function as the function or functions of the system they used
the most frequently formed the largest proportion (About 31%
Fig. 4. Changes in the number of new users and number of new submissions of respondents answered “Useful” and selected the viewing
during the operation period function as a most frequently used function, and about 42%
answered “Somewhat useful” and selected the viewing
Note: The number of submissions for Week 1 does not include the 44 pieces function as a most frequently used function). Among this
of information submitted by the present authors
proportion of respondents, in particular the proportion who
b) Evaluation concerning usefulness for tourist trips selected “Viewing function & Recommendation function” as
In order to evaluate the usefulness of the system for people the functions they used the most frequently was the largest
taking tourist trips, users were asked whether or not the system (About 16% of respondents answered “Useful” and selected
was useful when a user was actually taking a tourist trip “Viewing function & Recommendation function” as their most
(visiting tourist spots, moving between tourist spots, etc.) in the frequently used functions, and about 19% answered
region of operation. The result was that about 39% of users “Somewhat useful” and selected “Viewing function &
responded “Useful” and about 47% responded “Somewhat Recommendation function” as their most frequently used
useful”, while about 2% responded “Not very useful” and none functions). These results show that by using the viewing
responded “Not useful”, so it can be seen that the system was function in combination with the recommendation function in
highly rated. Therefore, it can be anticipated that the system of particular, information about tourist spots can be efficiently
the present study will prove useful when users are actually obtained, and therefore it can be anticipated that the system
taking tourist trips, through its support for users’ efficient will be useful when users are actually taking tourist trips.
acquisition of information about tourist spots. Therefore, as mentioned in Section IV.A about the synergistic
effect of integrating three applications - a Web-GIS, an SNS,
2) Evaluation of the system’s unique functions
and a recommendation system - to form one system, it can be
a) Evaluation of use, classified by function said that it was useful to include a Web-GIS and a
Users were shown all the functions of the system, then recommendation system in an SNS in order to enable users to
asked to select up to two that they themselves had used the access the system from either a PC or a mobile information
most frequently. Responses showed that the viewing function terminal to view tourist spots and receive recommendations for
was the most frequently used function, occupying about 48% tourist spots using a digital map.
of entire frequent function usage. Next was the
recommendation function, at about 27%, followed by the b) Detailed evaluation of recommendation function
button functions, at about 18%. TABLE IV shows cross Fig. 5 shows results for responses to four question items on
tabulation results for high usage frequency function and the recommendation function, the function which demonstrates
gender. “Viewing function & Recommendation function”, the uniqueness of the system the most clearly. The two
“Viewing function & Button functions”, and “Viewing question items on usefulness – the usefulness of the present
function” were each selected by many users in the entire group location display on the recommendation page and the
– about 45%, about 25%, and about 13% of users, respectively. usefulness of tourist spot category selection on the
No users selected the recommendation function alone as a recommendation page – concern the areas for improvement
response. These results show that over 80% of users used the that were identified based on the results of the evaluation of
viewing function and about half the users used the combination operation test, described in Section VI.D. The proportion of
of the viewing function and the recommendation function respondents who answered “Suitable” or “Somewhat suitable”
when using the system. Therefore, it can be considered that the concerning the suitability of the tourist spots recommended
system was mainly used for obtaining information about tourist was about 74%. Therefore, it can be considered that for a
spots. Further, looking at results by gender, both genders used majority of users, suitable results were obtained when tourist
the viewing function itself often. However, about 17% of males spots with a high degree of similarity to preferences of users
responded “Viewing function” and about 31% responded were recommended by calculating degree of similarity based
“Viewing function & Button functions”, and these percentages on preference information registered by each user in the “My
are higher than those for female respondents by about 9% and information” section of the system and information about
about 8%, respectively. Meanwhile, the percentage of females evaluation of each tourist spot. Further, concerning the other

17 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 5, No. 12, 2014

three questions - that is, the questions on the suitability of the log data during actual operation. In the present study, a Google
recommendations of up to a maximum of ten tourist spots, the Analytics API was included in the program that was developed
usefulness of the present location display on the and access was analyzed. Google Analytics is a free
recommendation page, and the usefulness of category selection application provided by Google, and is often used as an
on the recommendation page - the proportion of respondents analysis tool. Google Analytics can be used simply by adding
who answered “Suitable”/“Useful” or “Somewhat the API into the program of each page of a website. Once this
suitable”/“Somewhat useful” for these questions was about is done, access logs can be obtained.
77%, about 77%, and about 84% respectively, meaning these
features were rated very highly. Based on these results, it can b) Evaluation based on results of access analysis
be said that in addition to high ratings for the tourist spot About 65% of all visits to the present system (a total of
recommendation results and the feature of recommending up to 3,524 visits) were those which accessed the system from a PC.
a maximum of ten tourist spots (which was included in the However, access from mobile information terminals also
system design from the beginning), the two features related to comprised a not insubstantial amount of the total – about 35%.
the improved recommendation page (present location display, Concerning this point, in the study by Yamada and Yamamoto
tourist spot category selection), which was improved based on (2013) [18], access from mobile information terminals to the
the results of the evaluation of operation test, also obtained social media GIS designed for information exchange between
high ratings. regions comprised about 12% of the total access count in 2012.
Comparing results, it is clear that the proportion of access from
B. Evaluation of support of acquisition of tourist spot mobile information terminals in the present study is about three
information times that amount. Further, Yamada and Yamamoto (2013)
1) Evaluation focusing on access count and access [18] did not provide an interface optimized for mobile
methods information terminals in their study, so it can be said that the
provision of an interface for mobile information terminals
a) Outline of access analysis (whose use has been spreading rapidly in recent years) in the
In the present study, evaluation focusing on access count system of the present study was useful.
and access methods was conducted by analyzing access using
TABLE IV. FUNCTIONS USED THE MOST FREQUENTLY IN THE SYSTEM (UP TO TWO SELECTED)

Viewing Viewing Viewing function Recommendation Submission Submission


Recommendation
Viewing function & function & & function & Submission function & function &
function &
function Comment Button Recommendation Comment function Button Viewing
Button functions
function functions function function functions function
Males
16.7 0.0 30.6 38.9 2.7 2.7 5.6 0.0 2.8
(36 people)
Females
7.7 7.7 23.1 50.0 0.0 7.7 0.0 3.8 0.0
(26 people)
Total
12.9 3.2 27.4 43.5 1.7 4.8 3.2 1.7 1.6
(62 people)

TABLE V. RELATIONSHIP BETWEEN USEFULNESS OF THE SYSTEM FOR TAKING TOURIST TRIPS AND FUNCTIONS WITH A HIGH USAGE FREQUENCY

Viewing Viewing Viewing function Recommendation Submission Submission


Recommendation
Viewing function & function & & function & Submission function & function &
function &
function Comment Button Recommendation Comment function Button Viewing
Button functions
function functions function function functions function
Useful 4.8 1.6 8.2 16.1 1.7 3.2 0.0 1.6 1.6
Somewhat
4.8 1.6 16.1 19.4 0.0 1.6 3.2 0.0 0.0
useful
Can’t say
3.2 0.0 1.6 8.1 0.0 0.0 0.0 0.0 0.0
either way
Not very
0.0 0.0 1.6 0.0 0.0 0.0 0.0 0.0 0.0
useful

18 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 5, No. 12, 2014

Fig. 5. Results for responses to four question items concerning recommendation function

Note: The response options shown on the right in the explanatory notes in the figure are for the two question items on usefulness – the usefulness of the present
location display on the recommendation page, and the usefulness of tourist spot category selection on the recommendation page.

TABLE VI shows the ten pages with the most number of tourist trips, operation of the following two functions can be
visits from each type of information terminal. Visits from both proposed.
PCs and mobile information terminals were mainly to the
viewing page and the recommendation page. Meanwhile, it can  Routing function
be seen that there were visits from mobile information This function would enable display of a route to a
terminals to the viewing page and the recommendation page recommended tourist spot from the user’s present location or
for PCs, and the number of visits to the submission page from from any location specified by the user. Further, by linking
PCs was more than twice the number of visits to the routes between multiple recommended tourist spots, it would
submission page from mobile information terminals. Further, enable them to be displayed as a tourist route. Through this, it
pages for viewing detailed information on individual tourist can be anticipated that users will be able to plan tourist trips
spots (Uchikipan Bakery and Red Brick Park) are in the list of taking into account travel time and travel methods.
the top ten most visited pages for PCs, but not in the list of the
top ten most visited pages for mobile information terminals.  Additional function for viewing page
Therefore, comparing use of the system from PCs to that from It is proposed that the display method be changed such that
mobile information terminals, it is thought that there was more on the page for viewing detailed information about tourist spots,
use of the system for submitting information on tourist spots the tourist spot being viewed stands out on the digital map.
and viewing detailed information on tourist spots from PCs Further, it is proposed that categories such as “For families”,
than there was from mobile information terminals. Further, it is “For groups”, and “For solo travel” be added to the category
thought that users accessing the system from mobile divisions for submissions of information, to enable viewing of
information terminals mainly visited the viewing page and the submitted information that is even more suited to the
recommendation page, and often used the system to obtain behavioral characteristics of each user.
simple information about tourist spots.
VIII. CONCLUSION
2) Evaluation focusing on submitted information
TABLE VII shows how many pieces of information were The conclusion of the present study can be summarized
submitted to each tourist spot category from among the 232 into the following three points.
pieces of information submitted. Looking at the results by (1) As an information system for recommending tourist
category, there were many submissions concerning restaurants spots, a social recommendation GIS which integrated three
and cafes – 75 (about 32% of the total); however, submissions applications – an SNS, a Web-GIS, and a recommendation
of information were made in all categories. Therefore, it can be system – was designed and developed. Developing a system
said that a variety of tourist spot information was submitted, which integrated these applications enabled constraints
and in line with the aim of the system, information for concerning information inspection by users, time and spatial
recommending tourist spots suited to the preferences of each constraints, and constraints concerning continuous operation of
user was accumulated. the system to be eased. The central part of Yokohama City in
C. Identification of measures to improve the system Kanagawa Prefecture was selected as the region for operation,
and system details were configured after a survey of present
Based on the results of the evaluation outlined in this
conditions was carried out.
session, in order to more effectively support people taking

19 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 5, No. 12, 2014

TABLE VI. TOP TEN VISITED PAGES, CLASSIFIED BY INFORMATION TERMINAL USED TO VISIT PAGE

PC
Rank Page name Number of visits Percentage (%)
1 Initial page 257 11.1
2 Viewing page 140 6.1
3 Recommendation page (Category selection) 138 6.0
4 Submission page 97 4.2
5 Logout page 95 4.1
6 Recommendation page (Display of recommendation results) 88 3.8
7 My information page 81 3.5
8 Initial page (for mobile information terminals) 59 2.6
9 Tourist spot detailed information viewing page (Uchikipan Bakery) 23 1.0
10 Tourist spot detailed information viewing page (Red Brick Park) 19 0.8
Total 2,308 100.0

Mobile information terminals


Rank Page name Number of visits Percentage (%)
1 Initial page 212 17.4
2 Initial page (for mobile information terminals) 94 7.7
3 Viewing page (for mobile information terminals) 54 4.4
4 Recommendation page (Category selection) (for mobile information terminals) 50 4.1
5 Viewing page 46 3.8
6 Logout page (for mobile information terminals) 46 3.8
7 Recommendation page (Category selection) 44 3.6
8 Submission page (for mobile information terminals) 38 3.1
9 Recommendation page (Display of recommendation results) (for mobile information terminals) 36 3.0
10 Recommendation page (Display of recommendation results) 28 2.3
Total 1,216 100.0

TABLE VII. SUBMISSIONS OF INFORMATION, CLASSIFIED BY TOURIST SPOT system, over 80% were in their twenties to forties. This is
CATEGORY consistent with the fact that the main user base of regular SNSs
consists of people in their twenties to forties. Concerning
Number of
Category Percentage (%) change in the number of users, the number of users one month
submissions
after the start of operation reached over 90% of the total
Restaurants/Cafes 75 32.3 number of users for the whole period of operation. The final
Other eating/drinking establishments 11 4.6 total number of items of information submitted was 232.
Noted places/Historic sites 41 17.7 (3) Results of a questionnaire given to users of the system
Shopping 26 11.2 after it was operated showed that it can be anticipated that the
Theme parks/Parks 21 9.1 viewing function and the recommendation function in
particular will lead to effective support for people taking tourist
Art galleries/Museums 22 9.5
trips, and also showed the usefulness of having integrated a
Scenery 15 6.5 recommendation system with a Web-GIS and an SNS. Further,
Other 21 9.1 access analysis which used log data confirmed that of the total
Total 232 100.0 access count, about 35% of access was from mobile
information terminals, so it can be said that the provision of an
(2) Operation was to be conducted over a two month interface optimized for mobile information terminals proved
period; therefore, prior to operation, an operation test was useful.
conducted for two weeks, areas where the system could be
An example of a future topic for research is to implement
improved were identified, and the system was reconfigured. It
the functions proposed in Section VII.C, in order to support
was intended that the general public who are more than 18
people taking tourist trips more effectively. Another example is
years both in and outside the region for operation would be
to operate the system in other urban tourist areas, boost the
users of the system. Meanwhile, of the 98 actual users of the

20 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 5, No. 12, 2014

track record of use of the system, and increase the significance Report of The Institute of Electronics, Information and Communication
of using the system. Engineers, NLC, “Natural langage Understanding and Models of
Communication”, Vol.112, No.367, pp.13-18, 2012.
ACKNOWLEDGMENT [8] C. C. Yu and H. P. Chang, “Personalized location-based
recommendation services for tour planning in mobile tourism
In the operation of the social recommendation GIS and the applications”, Proceedings of the 10th International Conference on E-
questionnaires of this study, enormous cooperation was Commerce and Web Technologies, pp.38-49, 2009.
received from those mainly in the Kanto region such as [9] J. M. Noguera, M. J. Barranco, R. J. Segura and L. Martinez, “A mobile
Kanagawa prefecture and Tokyo Metropolis. We would like to 3D-GIS hybrid recommender system for tourism”, Information
Sciences, Vol.215, pp.37-52, 2012.
take this opportunity to gratefully acknowledge them.
[10] L. Baltrunas, B. Ludwig, S. Peer and F. Ricci, Francesco, “Context-
NOTES aware places of interest recommendations for mobile users. Design,
User Experience, and Usability”, Theory, Methods, Tools and Practice,
(1) Yokohama City official tourist information site: Lecture Notes in Computer Science, Vol.6769, pp.531-540, 2011.
http://www.welcome.city.yokohama.jp/ja/. (accessed December. 20, [11] M. Ye, P. Yin, W. C. Lee and D. L. Lee, Exploiting geographical
2013). (Website) influence for collaborative point-of-interest recommendation,
(2) Hamatch! SNS: http://sns.hamatch.jp/. (accessed December 15, 2013). Proceedings of the 34th International ACM SIGIR Conference on
(Website) Research and Development in Information Retrieval, pp.325-334, 2011.
(3) TripAdvisor: http://www.tripadvisor.jp/. (accessed December 20, 2013). [12] J. J. C. Ying, E. H. C. Lu, W. N. Kuo and V. S. Tseng, “Urban point-of-
(Website) interest recommendation by mining user check-in behaviors”,
(4) MAPPLE Tourist Information: http://www.mapple.net/. (accessed Proceedings of the ACM SIGKDD International Workshop on Urban
December 20, 2013). (Website) Computing, pp.63-70, 2012.
(5) Jalan Tourist Information: http://www.jalan.net/kankou/. (accessed [13] J. Bao, Y. Zheng and M. F. Mokbel, “Location-based and preference-
December 20, 2013). (Website) aware recommendation using sparse geo-social networking data”,
(6) (The travel word-of-mouth site “4travel.jp”: http://4travel.jp/. (accessed Proceedings of the 20th International Conference on Advances in
December 20, 2013). (Website) Geographic Information Systems, pp.199-208, 2012.
(7) Foursquare: https://ja.foursquare.com/. (accessed June 25, 2014). [14] Q. Yuan, G. Cong, Z. Ma, A. Sun and N. M. Thalmann, “Time-aware
(Website) point-of-interest recommendation”, Proceedings of the 36th
International ACM SIGIR Conference on Research and Development in
(8) Facebook Places: https://www.facebook.com/directory/places/. Information Retrieval, pp.363-372, 2013.
(accessed June 25, 2014). (Website)
[15] X. Liu, Y. Liu, K. Aberer andC. Miao, “Personalized point-of-interest
REFERENCES recommendation by mining users’ preference transition”, Proceedings of
[1] J. Ishizuka, Y. Suzuki and K. Kawagoe, “Method for searching for the 22nd ACM International Conference on Information and Knowledge
similarities in data on movement paths, designed to support sightseeing Management, pp.733-738, 2013.
in Kyoto”, The Special Interest Group Technical Reports of Information [16] T. Yanagisawa and K. Yamamoto, “Study on Information Sharing GIS
Processing Society of Japan, CVIM, “Computer Vision and Image to Accumulate Local Knowledge in Local Communities”, Theory and
Media”, 2007(1), pp.17-23, 2007. Applications of GIS, Vol.20, No.1, pp.61-70, 2012.
[2] Y. Kurata, “Introducing a hot-start mechanism to a web-based tour [17] H. Nakahara, T. Yanagisawa and K. Yamamoto, “Study on a Web-GIS
planner CT-Planner and Increasing its coverage areas”, Papers and to Support the Communication of Regional Knowledge in Regional
Proceedings of the Geographic Information Systems Association of Communities: Focusing on Regional Residents’ Experiential
Japan, Vol.21, CD-ROM, 2012. Knowledge”, Socio- Informatics, Vol.1, No.2, pp.77-92, 2012.
[3] H. Kawamura, Efforts to spread standard tags in Hokkaido tourist [18] S. Yamada and K. Yamamoto, “Development of Social Media GIS for
information, and the development of kyun-channel, Journal of Digital Information Exchange between Regions”, International Journal of
Practice, Vol.3, No.4, pp.272-280, 2012. Advanced Computer Science and Applications, Vol.4, No.8, pp.62-73,
[4] T. Kurashima, T. Iwata, G. Irie and K. Fujimura, Travel Route 2013.
Recommendation using Geotags on Photo Sharing Service, Technical [19] T. Okuma and K. Yamamoto, “Study on a Social Media GIS to
Report of The Institute of Electronics, Information and Communication Accumulate Urban Disaster Information: Accumulation of Disaster
Engineers, LOIS, “Life Intelligence and Office Information Systems”, Information during Normal Times for Disaster Reduction Measures”,
Vol.109, No.450, pp.55-60, 2010. Socio-Informatics, Vol.2, No.2, pp.49-65, 2013.
[5] S. Van Canneyt, S. Schockaert, O.V. Laere and B. Dhoedt, “Time- [20] D. Jannach, M. Zanker, A. Felfernig, G. Friedrich, “Recommender
dependent recommendation of tourist attractions using Flickr”, Systems: An Introduction”, Cambridge University Press, U.K., 2011.
Proceedings of the 23rd Benelux Conference on Artificial Intelligence,
pp.255-262, 2011. [21] T. Kamishima, “Algorithms for recommender systems (2)”,
Transactions of Japanese Society of Artificial Intelligence, Vol.23, No.1,
[6] M. Batet, A. Moreno and D. Sánchez, D. Isern and A. Valls, Turist@: pp.89-103, 2008.
Agent-based personalised recommendation of tourist activities, Expert
Systems with Applications, Vol.39, No.8, pp.7319-7329, 2012. [22] Ministry of Internal Affairs and Communications of Japan, “2011 White
paper - information and communications in Japan”, Tokyo, 2011.
[7] H. Uehara, K. Shimada and T. Endo, “Sightseeing location
recommendation using tourism information on the Web”, Technical

21 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 5, No. 12, 2014

A Feasibility Study on Porting the Community Land


Model onto Accelerators Using Openacc
D. Wang W. Wu F. Winkler, W. Ding, O.
Climate Change Science Department of Computer Hernandez
Institute Science Computer Science and
Oak Ridge National Lab University of Tennessee Mathematics Division Oak Ridge
Oak, Ridge, TN, USA Knoxville, TN, USA National Lab Oak, Ridge, TN, USA

Abstract—As environmental models (such as Accelerated The Community Earth System Model is one of the US leading
Climate Model for Energy (ACME), Parallel Reactive Flow and earth system models. CESM is being actively developed under
Transport Model (PFLOTRAN), Arctic Terrestrial Simulator the “Accelarated Climate Model for Energy (ACME)” project
(ATS), etc.) became more and more complicated, we are facing to support Department of Energy’s climate and environmental
enormous challenges regarding to porting those applications onto research. Within the CESM framework, the CLM is designed
hybrid computing architecture. OpenACC emerges as a very to understand how natural and human changes in ecosystem
promising technology, therefore, we have conducted a feasibility affect climate [2]. The model represents several aspects of the
analysis on porting the Community Land Model (CLM), a land surface including surface heterogeneity and consists of
terrestrial ecosystem model within the Community Earth System
submodels related to land biogeophysics, the hydrologic cycle,
Models (CESM)). Specifically, we used automatic function testing
platform to extract a small computing kernel out of CLM, then
biogeochemistry, human dimensions, and ecosystem dynamics.
we apply this kernel into the actually CLM dataflow procedure, Currently, the offline CLM simulation system contains of more
and investigate the strategy of data parallelization and the than 1800 source files and over 350,000 lines of source code. It
benefit of data movement provided by current implementation of is well known that the software complexity of the Community
OpenACC. Even it is a non-intensive kernel, on a single 16-core Land Model becomes a barrier for rapid model improvements
computing node, the performance (based on the actual and validation, as well as efficient code porting to next
computation time using one GPU) of OpenACC implementation generation HPCs [3,4].
is 2.3 time faster than that of OpenMP implementation using
The main purposes of our efforts shown in this paper
single OpenMP thread, but it is 2.8 times slower than the
performance of OpenMP implementation using 16 threads. On
include: 1) Test data parallelel schemes based on current CLM
multiple nodes, MPI_OpenACC implementation demonstrated high level dataflow using a simple non-computing instentive
very good scalability on up to 128 GPUs on 128 computing nodes. function, 2) Investigate the usefullness of selective copy
This study also provides useful information for us to look into the implemention within on CLM simulation. 3) Evaluate the
potential benefits of “deep copy” capability and “routine” feature benefit and cost of porting CLM on accleartors using
of OpenACC standards. We believe that our experience on the OpenACC. Specifically, this paper presents detailed
environmental model, CLM, can be beneficial to many other information in following sections. We first provide a overview
scientific research programs who are interested to porting their of CLM software structure and dependancy, which leads to our
large scale scientific code using OpenACC onto high-end effort of scientific function testing system development. Using
computers, empowered by hybrid computing architecture. our our scientific function testing system, we have extracted
one computational kernel out of the whole system, and design
Keywords—OpenACC; Climate Modeling; Community Land and computational experiment for our model potring practices
Model; Functional Testing; Performance Analysis; Compiler- as well as the model computtional performance evaluations,
assisted Analysis using both OpenMP and OpenACC.
I. INTRODUCTION II. CLM SOFTWARE DEPENDENCY, DATA STRUCTURE
As the environmental models (such as Acclerated Climate AND WORKFLOW
Model for Energy (ACME), Parallel Reactive Flow and The software system of the global offline CLM includes
Transport Model (PFloTran), Arctic Terristrial Simulator physical earth system components, such as the CLM, data
(ATS), etc.) became more and more complicated, we are facing atmosphere (a proxy atmosphere model, which reads in
enormous challenges regarding to porting those applications atmospheric forcings to drive the CLM), stub ocean, stub ice
onto hybrid computing architecture. OpenACC emerges as a and stub glacier. It contains an application driver to configure
very promising technology. In the paper, we present our the parallel computing environment and the whole simulation
feasibilty study on porting the Community Land Model (CLM) system (physical earth system components and flux coupler
within the Community Earth System Models using OpenACC. between those components). It also includes several shared
Over the past several decades, researchers have made software modules and utilities, such as a flux coupler and its
significant progress in developing high fidelity earth system APIs to individual earth system component, parallel IO and
models to advance our understanding on earth system, and to performance profiling libraries [4,5]. The schematic diagram of
improve our capability of better projecting future scenarios [1]. the CLM software structure is shown in Figure 1. It is clear that

22 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 5, No. 12, 2014

the CLM simulation is highly dependent on other components, CLM software structure, an improvement from our previous
such as the flux coupler and the data atmosphere. visual analytics [6]. It provides much needed interface for
CLM software structure exploration and further benefits model
interpretation and new module development (URL: http://cem-
base.ornl.gov/CLM_Web/CLM_Web.html).
III. SOFTWARE DEPENDENCE AND SCIENTIFIC FUCNTIOANL
TESTING SYSTEM
It is obvious that porting the whole CLM simulation system
onto accelerator is a very challenging task, considering the
complexity of the code itself and more important, the high
software dependency on the variety of external math libraries
and other earth system components. However, we have
implemented an automatic ecosystem function testing system,
Fig. 1. Software configuration of a global offline CLM simulation that
which is able to extract a specific subroutine/module from
shows a strong coupling with other earth system components. Several earth CLM and to generate a standalone functional test module for
system model components are listed, including a land model (Land), a data the given subroutine/module. It is a significant improvement
atmospheric model (Data Atmosphere), stub sea ice model (Ice), ocean model from our previous effort on function test platform [7]. Using
(Ocn) and glacier model (Glc) this testing system, we have successfully tested most
ecosystem modules, and it can be extended to all submodels in
CLM or even CESM. Originally, it is designed to create direct
linkages between site measurements and key ecosystem
functions within CLM. It provides much needed integration
interfaces for both field experimentalists and ecosystem
modelers to improve the model’s representation of ecosystem
processes within the CESM framework without large software
overhead. For the completeness of this paper, we briefly
describe the functional testing system here, shown in Figure 3.

Fig. 2. Hierarchical, derived data structure to represent the heterogeneity of


the CLM landscape surface

The key data structure of CLM is a globally accessible


derived data type, designed to represent the heterogeneity of
landscape surface. Figure 2 shows the CLM data structure in
the memory. Each layer of the data structure contains two
groups of variables: 1) mapping indexes to represent the spatial
Fig. 3. The software structure of ecosystem function test system. “Shared
connections between those four layers: gridcell, landunit, Library” component contains CLM key data structure and other shared
column, and PFT; 2) derived datatype to store physical data software utilities. For a given CLM function (a single or a group of
associated with each layer including energy, water, subroutine(s) in the “Models” component) our system can generate a
momentum, flux etc. corresponding test module, located in “Unit Test Module” component, which
in turn, driven by “Unit Test Driver”
In the CLM, each gridcell, landunit, soil column, and PFT
has a unique ID number. Those multiple level ID numbers are As shown in Figure 3, the testing system contains a
used to create the mapping indexes between those hierarchical “Shared Library” component, which includes modules
landscape surface data structures. The computational domain commonly used by most of CLM functions, such as key data
partition depends on the total number of gridcells across the structure (clm_type), physical/chemical/ecological constants
whole landscape. A static domain-partitioning scheme is (clm_const, and pft_const etc.) and other utilities (such as
implemented in the CLM, so the number of PFTs, soil String manipulation functions). The “Models” component
columns, landunits, and gridcells are fixed on each process contains most of software subroutines and modules related to
during the simulation, most important, there is no cross-domain ecosystem functions in CLM, which are exactly the same as the
communication at each of the layered landscape data structure. ones in CLM. In order to increase the software system’s
In another word, CLM, at current stage, is a very good portability on computing platforms, we have decoupled CLM
candidate for data parallelism using GPU. Furthermore, a web- connections with other CESM components, such as Coupler
based visual analytic system has been developed to explore and Atmosphere, and we have removed several external

23 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 5, No. 12, 2014

libraries and component (such as MPI, NetCDF, PIO and V. MPI_OPENMP IMPLEMENTATION
Coupler) from original source code using proxy libraries or Since in most cases, CLM is configured to run with the data
component. The key data structure used by CLM (clm_type) is of global earth, therefore, in this section, we present a way to
still kept as the same as before, so data used for CLM can be wrap the CNGResp Model with MPI and OpenMP to
directly used in testing program. “Unit Test Module” contains maximum parallel computation. According to Fig 3, the top
a set of automatically generated unit test modules for given level of CLM data structure is grid cell. Each grid cell is
ecosystem functions of interest. “Unit Test Driver” does independent, so a generic method is to parallel the program by
initialization job to ensure memory location of variables used grid cell; each MPI operates on one or more grid cells.
by unit test module is allocated, it also executes unit test However, in this particular CNGResp Model, the subroutine
module and verifies the testing results. operates on the column level, so instead of dealing with grid
IV. CASE STUDY CONFIGURATION cell and land unit, we only have to parallelize column. Assume
the number of column is C, and number of MPI process is NP,
As we mention in the previous section, CLM is a very then each MPI process operates on C/NP columns. Since most
complicated modeling system, it will be a very challenging current CPUs have multiple cores, we use OpenMP to
task to rewrite the majority code using CUDA APIs, therefore, parallelize computation on each MPI process. If there are NT
we are more towards to the high-level derivative approach OpenMP threads, so each OpenMP operates on C/NP/NT
using OpenACC [8]. In this study, we focus on the test of data columns. CNGResp Model does not access all the pfts in each
parallel schemes based on current CLM high level dataflow column, so each OpenMP thread has its private pft filter
using a simple non-computing intensive function using one variable to get access to the filtered pfts. Algorithm 1 shows
specific ecosystem function (kernel), CNGResp. Within CLM, the pseudo code of MPI-OpenMP implementation of
the CNGResp Module is designed to update all the growth CNGResp Model
respiration fluxes (the prognostic carbon state variables) at
each timestep. Specifically, the schematic procedure of Algorithm 1 MPI_OpenMP based partition pseudo code
CNGResp Module is shown in Figure 4. of CNGResp Model

1. define local pointer to the global arrays within column_per_MPI = C/NP


CLM_type column_per_OpenMP = column_per_MPI/NT
2. assign local pointer to derived type arrays (input) steps = days * 24 * 2
3. assign local pointer to derived type arrays (output) begin_index = 1 + mpi_rank*column_per_MPI
!$OMP PARALLEL NUM_THREADS(NT)
4. loop though pfts to update leaf and fine root grown
PRIVATE(tid, pft_filter, num_pft, end_index)
respiration tid = omp_get_thread_num()
end_index = begin_index + (tid+1)*chunk_size
Fig. 4. Schematic procedure of CNGResp Module within CLM DO i = 1, steps
DO j = begin_index + tid*column_per_OpenMP,
Totally, there are 19 global variables within clm_type are begin_index + (tid+1)* column_per_OpenMP
used as input datastreams and 18 global variables are used as call get_pft_filter(pft_filter, num_pft)
both input and output datastreams. The computational call CNGResp(num_pft, pft_filter)
experiments are configured using similar settings for half- END DO
degree offline CLM simulation. Specifically, our landscape END DO
surface data structure contains 62482 gridcells, 83935 !$OMP END PARALLEL
landunits, 135628 soil columns, and 1101228 plant function
types. The workflow of our computational experiment is VI. OPENACC DIRECTIVE AND IMPLEMENTATION
designed as follow: In each timestep, we copy all the global
OpenACC is a directive-based language extension for
variables (both input and output datastreams, total 37 arrays)
Fortran, C, and C++, that facilitates the simple and effective
onto the GPU memory, then we break all the loop into parallel
use of accelerators (e.g., GPUs) without sacrificing portability
computation on GPU cores, and copy back these datastreams.
for non-accelerator systems. The Oak Ridge Leadership
Specifically, we first copied the user defined hierarchical data
Computing Facility (OLCF) has made a strategic investment in
structure, and around 300 MB data onto GPU memory. Then,
OpenACC for the Titan system and applications are starting to
after all the computation is done among the CUDA cores, all
use it. However OpenACC is a very young specification.
the data are copied out of the GPU to the corresponding CPU
Application scientists at ORNL have already identified a
memory locations.
number of extensions to OpenACC that would significantly
This experiment gives a good opportunity to investigate the enhance its expressiveness and usability in their applications.
usefulness and efficiency of selective copy implementation of Looking further forward, towards ExaScale computers, we see
one individual function within CLM simulation. Because the trends towards node-level environments with heterogeneous
CNGResp module is a non computing intensive kernel, which compute resources, and more complex memory environments.
can be used as a benchmark case to evaluate the benefit and Extending OpenACC to support such environments, with task-
cost of poring other CLM kernels on accelerators using similar based execution, the ability to control placement of data in
OpenACC features. memory, and interoperability with other prominent node-level
programming models will smooth the path for today’s

24 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 5, No. 12, 2014

applications to make the transition to new ExaScale VIII. COMPUTATIONAL EXPERIMENTS AND SCALABILITY
architectures, as well as preparing them to for the jump to next- ANALYSIS
generation programming models and languages. One of the
In this section, we investigate performance impact of MPI-
important missing features that OpenACC needs is to support
OpenMP and MPI-OpenACC implementation. The experiment
levels of memory copy. Data structures such as “struct” and
case used in benchmark is a fixed size problem since size of
“STL” are common used in C and C++ programming, but users
the landscape surface data structure is already given. Therefore,
are not allowed to copy those encapsulated data into GPU
strong scalability experiment is the best option to present the
memories directly. Manually restructuring code to avoid
performance speedup by using MPI_OpenMP and
dereference is a common way which is used to handle those
MPI_OpenACC. In a strong scalability experiment, the
cases. However this is a non-trivial process, especially when
problem size is fixed, while the number of OpenMP thread is
the code contains multiple levels of dereference. At current
increased.
stage, a very straightforward method was adopted in this study
to evaluate the efficiency of OpenACC copy by using our real A. Computational Platform
scientific application. Specifically, we use copy function to The computational platform used in this research is the
copy both the data structure, and all the input datastreams for Cray XT6 Titan supercomputer at the National Center for
CNGResp module from the clm_type data structure in CPU, Computational Sciences (NCCS) at Oak Ridge National
and break the computational do loop and map those Laboratory (ORNL). Titan uses 16-core AMD Opteron central
computation onto GPU cores. After the computation on GPU, processing units (CPUs) in conjunction with NVidia Tesla
we use the copy function to move these output datastreams out K20X GPUs. It uses 18,688 CPUs paired with an equal number
from GPU and updates the data values within clm_type in CPU of graphics processing units (GPUs) to perform at a theoretical
memory. peak of 27 PetaFLOPS.
VII. MPI_OPENACC IMPLEMENTATION A center-wide Lustre file system provides 5 PB of disk
Parallel partition scheme within our MPI_OpenACC space for all NCCS computing resources. The broach
Implementation is very similar to that within MPI_OpenMP. configuration of K20X GPU are listed as following: Processor
Assume the number of column is C, and number of MPI clock, 732 MHz; Memory clock, 2.6 GHz; Memory size 6 GB;
process is NP, then each MPI process operates on C/NP Memory I/O 384-bit GDDR5; and Memory configuration 24
columns. Since each computing node has a GPU, we use pieces of 64M ×16 GDDR5 SDRAM. According to the online
OpenACC to parallel computations on each MPI process. If K20X GPU document [9], the peak double precision floating
there are NT OpenACC threads, then each OpenACC operates point performance (board) can reach 1.31 teraflops, and the
on C/NP/NT columns. Algorithm 2 shows the pseudo code of memory bandwidth for board (ECC off) can reach 250
MPI-OpenACC implementation of CNGResp Model. The GBytes/sec. In our study, we used PGI FORTRAN compiler
significant parts of the code include 1) explicit data structure (version 14.7.0), Cray-mpich (version 6.3.0), OpenMP (version
copy (copy in), and 2) explicit data value copy (both in and 3.1) and CUDAtoolkit (version 5.5.20-1.0402.7700.8.1).
out). Due to the limitation of current PGI implementation, we B. Single Node (Shared Memory System)
are not able to use the “deep copy” and “routine” feature, but
we are very confident that our program will greatly benefit In shared memory system benchmark, we demonstrate the
from these two features once they are implemented, because performance impact by using OpenMP to parallel computation.
we can use the “routine” feature to make the code structure Figure 5 presents the strong scalability performance of MPI-
more concise, and most importantly, we can use the “deep OpenMP implementation on 1 node. In ideal strong scaling, a
copy” capability to copy only these input variables on and program is considered to scale linearly if the speedup (in terms
output variables off the GPU memory. of work units completed per unit time) is equal to the number
of processing elements used. While in our case, we are not able
Algorithm 2 MPI_OpenACC based partition pseudo code of to achieve linear speedup when the number of threads varies
CNGResp Model from 1 to 16.
However, we did observe the performance increase
column_per_MPI = C/NP (computation time decrease) when more OpenMP threads has
steps = days * 24 * 2 been used. It is because the operation contained in CNGResp
begin_index = 1 + mpi_rank*column_per_MPI subroutine is mostly floating point operation. The experiment
end_index = column_per_MPI*(mpi_rank+1) case is to simulate 30 days (1 iteration simulates plant growth
!$ACC DATA COPYIN(struct) COPY(members) respiration in 30 minutes). There is billions of floating point
DO i = 1, steps operations in total. Since the AMD CPU contains 16 cores,
!$ACC KERNELS when more OpenMP threads have been used, less computation
!$ACC LOOP INDEPENDENT work was assigned on each CPU core, therefore, less time has
DO j = begin_index, end_index been used for those floating point operations on each CPU
Compute CNGResp core. For example, when 16 OpenMP threads were used, the
END DO computation time of each core is less than 13.3 second, which
END DO gave out a speedup number of 6.3, as shown in Figure 5.
!$ACC END KERNELS
!$ACC END DATA

25 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 5, No. 12, 2014

Table 2 shows the strong scalability of CNGResp Model


running on Titan with the OpenACC implementation. For the
comparison, the speedup number of OpenACC is also
calculated against the computation time of sequential code on
CPU (that is around 89 seconds). Two facts worthy mentioning
here: (1) the computational time of OpenACC implementation
on single node (38.7 seconds) is faster than that of single thread
OpenMP implementation (88.9 seconds), but it is slower than
that of 16-thread OpenMP implementation (13.4 seconds). (2)
Due to the limitation of current deep copy feature, we have to
use the standard copy function to move all input and output
global variable and the data structure on and off the GPU.
More detailed results were shown in next section. Also from
the coding perspective, we found the OpenACC
implementation is very straightforward, so that we think
automatic instrumentation of these OpenACC directives into
Fig. 5. Strong Scalability of CNGResp Model (MPI_OpenMP) running on a the CLM source is feasible and have great potentials for further
single node of Titan machine using one MPI process. The speedup varies by CLM parallel code development.
the number of OpenMP threads on the single computing node
TABLE II. STRONG SCALABILITY OF CNGRESP MODEL (MPI_OPENACC
C. Multiple Nodes (Distributed Memory System) IMPLEMENTATION) RUNNING ON UP TO 128 NODES OF TITAN MACHINE,
EACH MPI PROCESS USES ONE K20C GPU. THE SPEEDUP IS CALCULATED
In the distributed memory test case, each MPI process AGAINST THE COMPUTATION TIME OF SEQUENTIAL CODE ON SINGLE CPU
occupies one computation node, and OpenMP is used to obtain CORE (88.9 SECONDS)
more parallelism within each MPI process. Table 1 presents the
strong scalability of CNGResp Model running in multiple # of GPUs Time (s) Speedup
Titan nodes. On each computing node, there is one MPI 1 38.7 2.3
process with 16 OpenMP threads. The computation time of 2 20.76 4.3
sequential implementation (using one MPI process and 1 4 10.56 8.5
OpenMP thread on a single node) is around 88.9 seconds,
which is used as the benchmark performance for the speedup 8 5.5 16.4
number calculation. 16 2.98 30.2
32 1.72 52.3
TABLE I. STRONG SCALABILITY OF CNGRESP MODEL (MPI_OPENMP)
RUNNING ON UP TO 128 NODES OF TITAN MACHINE. ON EACH COMPUTING 64 1.05 85.7
NODE (WITH 16 CPU CORES), ONE MPI PROCESS AND 16 OPENMP THREADS 128 0.74 121.6
WERE USED. THE SPEEDUP WAS CALCULATED AGAINST THE COMPUTATION
TIME OF SEQUENTIAL IMPLEMENT (1 MPI AND 1 OPENMP THREAD) ON As shown in Table 2, the model demonstrated a very good
SINGLE COMPUTING NODE (88.9 SECONDS) scalability up to 128 nodes, in which up to 128 GPUs were
# of Nodes Speedup used for computation. In the simulation using single computing
(Each has 16 cores) Time (s) (16 OpenMP threads) node, it took 38.7 second to finish all the computation using
one GPU, that gave out the speedup number of 2.3. While in
1 13.26 6.7
the simulation using 128 nodes, the maximum computation
2 6.22 14.3 time on each GPU is less than 0.74 second, giving out a
4 3.07 29.0 speedup number of 121.6.
8 1.52 58.6 IX. SYSTEMATIC PERFORMANCE ANALYSIS
16 0.73 121.9 In order to get more detailed information on the OpenACC
32 0.41 217.1
implementation, we used Vampir toolkit (www.vampir.eu) to
trace and analyze detailed performance matrix on GPU [10].
64 0.22 404.5
The Vampir toolkit consists of the runtime measurement
128 0.12 741.7 system Score-P [11], and the performance analysis tool Vampir
As shown in Table 1, the model demonstrated a good [12]. Score-P is a new convenient measurement infrastructure
scalability up to 128 nodes, in which up to 128*16 = 2048 for collecting performance data. It supports the developer with
CPU cores were used for computation. In the simulation using instrumentation and allows detailed logging of program
single computing node, the maximum computation time on execution for parallel applications using message passing
each core is less than 13.26 seconds. While in the simulation (MPI), threads (OpenMP, Pthreads), and offloading to
using 128 nodes, the maximum computation time on each CPU accelerators (OpenACC and CUDA). Score-P provides two
core is less than 0.12 second. commonly used techniques to investigate the performance
behavior of parallel applications: Profiling and Tracing.
Similarly, we have conducted the scalability experiment on Profiling is based on aggregating performance data, which
those Titan nodes, using MPI_OpenACC implementation. allows a statistical view on a program run such as number of

26 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 5, No. 12, 2014

invocations or accumulated time of functions or messages. Figure 6 shows the master timeline of the simulation on
Unlike profiling, the tracing approach does not summarize single Titan node. The MPI_Barrier, shown at time mark
events. Tracing records all events of an application run that are (11.00 second), presented the completion of model
of interest for later examination together with the time they initialization. The data copy preparation started right after the
occurred and a number of event type specific properties. A MPI_Barrier, and finished at the time mark of 11.25 second,
trace file contains all recorded events in a chronological order when data copy started. The actual GPU computation started at
and therewith allows a time line representation of the program the time mark of 11.34 second, therefore, the total time of data
execution. Trace files generated by Score-P can be analyzed movement (320 MB) took only 0.09 seconds. The total GPU
with Vampir in a post-processing step. Vampir is a computation time was around 38.70 seconds.
performance analysis tool that offers intuitive parallel event
trace visualization with many displays showing different
aspects of the parallel performance behavior. Vampir provides
interactive zooming and browsing to show either a broad
overview or details of the program behavior. Different timeline
displays show application activities and communication along
a time axis. Statistical displays provide quantitative results for
arbitrary portions of the timelines. Powerful zooming and
Fig. 6. Vampir Timeline information on CNGResp Model running on single
scrolling allows to pinpoint the real cause of performance Titan node with one MPI process (Master thread) and one K20x GPU
problems. Vampir is designed to be an easy to use tool, which (CUDA[0:8])
enables developers to quickly display the program behavior at
any level of detail. The CNGResp kernel was automatically renamed after the
caller function “run_test_acc” plus the line number (#462)
Since tracing causes some instrumentation overhead we
where OpenACC kernel directive was defined.
used profiling to get an overview of accumulated timing
information of MPI, user regions, and CUDA kernel Figure 7 shows the trace information on our simulation
executions generated by OpenACC directives. This was very using 4 MPI processes and 4 GPUs on 4 Titan nodes, including
useful to determine the ratio of GPU to host computation. For a master timeline, function summary, message information, as
more advanced performance analysis we used tracing to well as a close-up look at the data copy before the GPU
visualize the dynamic runtime behavior in Vampir at any level computation. Again, the MPI_Barrier, started and finished
of detail. Using tracing we have recorded exact time stamps for between the time mark of 2.83 second and 2.86 seconds,
all GPU related events such as kernel execution on the presented the completion of model initialization (total time of
assigned CUDA streams, fixed CUDA kernel metrics (threads 11.23 second on all 4 nodes).
per kernel, memory usage), host-device data transfers and
synchronization, and GPU idle time. The Vampir analysis of The data copy preparation started right after the completion
the generated trace files helped us to understand and enhance of MPI_Barrier (at 2.86 second), and all finished before the
the OpenACC implementation at scale by using different time mark of 3.08 second. Therefore, on 4 nodes, totally about
OpenACC directive combinations which have impact on 0.88 second was used for data preparation, data copy
CUDA kernel executions, host-device data transfers and (including some extra ideal time on each node). The actually
synchronization. data copy operation only took about 0.02 second on each node.
The actual GPU computation started around the time mark of
X. RESULT DISCUSSIONS 3.08 second, and the total time of GPU computation took about
39.83 seconds. Totally, around 600 MB data have been moved
In this section, we focus on the analysis of trace files for
in and out of GPU devices.
these computational experiments with OpenACC
implementation.

27 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 5, No. 12, 2014

Fig. 7. Vampir visualization of CNGResp Model running on 4 Titan nodes with 4 MPI processes and 4 K20x GPUs.The main timeline is shown in the top panel.
A small segment of the data movement is shown in a close-up window on the bottom. One can clearly see that the GPU is only idle when there is no data transfer
or kernel execution. The Function Summary (left middle) shows that almost 75% of the GPU is fully occupied. The Function Summary (right middle) shows
profile information of all functions. A Function Legend is shown on the left bottom panel. A Message Summary (left bottom) shows that 605MiB of data were
copied between host and GPU device

Herein, we also listed some trace information on approach using OpenACC of great interest. Based on our
computational experience across 128 Titan nodes using 128 previous work (such as on interactive CLM structure
MPI processes and 128 GPUs. During the simulation, the exploration and CLM functional testing code generation and
MPI_Barrier, started and finished between the timestamps of data stream identification, compiler analysis and other
0.23 second and 0.25 seconds, presented the completion of preliminary CLM code immigration preparations), we think it
model initialization (total time of 30.72 second on all 128 is the very useful first step to porting CLM onto pre-ExaScale
nodes). The data preparation and copy operations started right computers using OpenACC approach. There are further
after the completion of MPI_Barrier (at 0.25 second), and all investigations needed, specially, those implementations using
finished before the timestamp of 0.53 second, when the GPU “deep copy” function and “routine” features. We are also
computation starts on all GPUs. Therefore, on 128 nodes, conducting similarity-based analysis for CLM [13,14], which
totally about 35.84 second was used for data preparation and in turn give us more information on porting individual kernels
data copy (including significant extra ideal time on each node). onto GPUs. We believe that our experience on pilot study on
The actually data copy operation still only took about 0.02 porting modular environmental models can be beneficial to
second on each node. The actual GPU computation started many other scientific research programs which adapt high-level
around the time stamp of 0.53 second, and ended around the programming directives to porting scientific applications on
time mark of 1.01 second. Therefore, the total time of GPU hybrid high-end computers.
computation took about 61.44 seconds. Again, totally, around
600 MB data have been moved in and out of GPU devices. ACKNOWLEDGMENT
This research was partially funded by Terrestrial
XI. CONCLUSIONS AND FUTURE WORK Ecosystem Sciences (TES) Program and Climate Sciences for
We have demonstrated our objectives, methods and case Sustainable Energy Future (CSSEF) Program under the
study to investigate the feasibility of porting CLM key data Biological and Environmental Research (BER), Office of
structure and simplified data flow onto accelerators using the Science of the U.S. Department of Energy (DOE). This
copy feature of OpenACC. It is obvious that there are room for research used resources of the Oak Ridge Leadership
further OpenACC performance improvement, specially related Computing Facility, located in the National Center for
to selective data movement and code rewriting using “routine” Computational Sciences at Oak Ridge National Laboratory,
feature. Considering the huge software complexity of CLM which is managed by UT-Battelle LLC for the Department of
code, and continuous code changes from active model Energy under contract DE-AC05-00OR22725.
development, we view the high-level programing derivatives

28 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 5, No. 12, 2014

REFERENCES Programming Directives for Porting Applications to GPUs". Facing the


[1] Washington, W. M., C. L. Parkinson, 2005, An Introduction to Three- Multicore - Challenge II. Lecture Notes in Computer Science, Volume
Dimensional Climate Modeling, University Science Books, 2rn edition. 7174/2012, pages 96-107, 2012
[2] Oleson, K., Lawrence, D., Gordon, B., Flanner, M., Kluzek, E., Peter, J., [9] Telse K-Series Overview, available online at
Levis, S., Swenson, S., Thornton, P., and Feddema J., 2010, Technical http://www.nvidia.com/content/tesla/pdf/Tesla-KSeries-Overview-
description of version 4.0 of the Community Land Model (CLM). LR.pdf
[3] Wang, D., Post, W., Wilson, B., 2011. Climate Change Modeling: [10] Dietrich, R., Winkler, F., William, T., Stolle, J., Henschel, R., & Berry,
Computational Opportunities and Challenges, IEEE Computing in D. K. (2013). A Case Study: Holistic Performance Analysis on
Science and Engineering, Vol 13(5), pp36-42 Heterogeneous Architectures using the Vampir Toolchain.
In PARCO (pp. 793-802).
[4] Wang, D., Schuchart, J., Janjusic, T., Winkler, F., and Xu, Y. 2014a.
Toward Better Understanding of the Community Land Model within the [11] A. Knüpfer, H. Brunst, J. Doleschal, M. Jurenz, M. Lieber, H. Mickler,
Earth System Modeling Framework, International Conference on M. Müller and W.E. Nagel, “The Vampir Performance Analysis Tool-
Computational Science, Cairns, Australia, 2014, Set”, Tools for High Performance Computing, pp 139-155, Springer
Verlag, 2008
[5] Domke, J., Wang, D., Runtime Tracing of the Community Earth System
Model: Feasibility Study and Benefits, 12th Workshop on Tools for [12] A. Knüpfer, C. Rössel, D. an Mey, S. Biersdorf, K. Diethelm, D.
Program Development and Analysis in Computational Science, Omaha, Eschweiler, M. Gerndt, D. Lorenz, A. D. Malony, W. E. Nagel, Y.
Nebraska, June 2012, procedia CS 9: pp1950-1958, 2012 Oleynik, P. Saviankou, D. Schmidl, S. Shende, R. Tschüter, M. Wagner,
B. Wesarg, F. Wolf: Score-P - A Joint Performance Measurement Run-
[6] Xu, Y., Wang, D., Janjusic, T., Xu, X., A Web-based Visual Analytic Time Infrastructure for Periscope, Scalasca, TAU, and Vampir,
System for Understanding the Structure of Community Land Model, Proceedings of 5th Parallel Tools Workshop, 2011
International Conference on Software Engineering Research and
Practice, June, 2014. [13] Ding, W., Hsu, C. H., Hernandez, O., Chapman, B., & Graham, R.
(2013). KLONOS: Similarity‐based planning tool support for porting
[7] Wang, D., Xu, Y., Thornton, P., King, A., Gu, L., Steed, C., Schuchart, scientific applications. Concurrency and Computation: Practice and
J., 2014b. A Functional Testing Platform for the Community Land Experience, 25(8), 1072-1088
Model, Environmental Modeling and Software, Vol. 55, pp25-31,
10.1016/j.envsoft.2014.01.015 [14] Wei Ding, Oscar Hernandez, and Barbara Chapman. "A Similarity-
Based Analysis Tool for Porting OpenMP Applications". In Facing the
[8] Oscar Hernandez, Wei Ding, Barbara Chapman, Ramanan Sankaran, Multicore-Challenge III, pp. 13-24. Springer Berlin Heidelberg, 2013.
Richard L. Graham, Christos Kartsaklis, "Experiences with High-Level

29 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 5, No. 12, 2014

B2C E-Commerce Fact-Based Negotiation Using Big


Data Analytics and Agent-Based Technologies
Hasan Al-Sakran
Management Information Systems Dept
King Saud University
Riyadh, Saudi Arabia

Abstract—The focus of this study is application of intelligent customers and attract new one. Customers negotiate for better
agent in negotiation between buyer and seller in B2C Commerce deals, and e-commerce business organizations are negotiating
using big data analytics. The developed model is used to conduct in order to keep their customers, to build lasting relationships,
negotiations on behalf of prospective buyers and sellers using and to increase customer satisfaction Negotiation is one of such
analytics to improve negotiations to meet the practical services. In a view of increased role of negotiations in B2C
requirements. The objective of this study is to explore the commerce it is appropriate to give this particular topic the
opportunities of using big data and business analytics for attention it deserves. Negotiation can significantly benefit
negotiation, where big data analytics can be used to create new from big data analytics. Using analytics will allow businesses
opportunities for bidding. Using big data analytics sellers may
to shorten negotiation time and effort associated with it on one
learn to predict the buyers’ negotiation strategy and therefore
adopt optimal tactics to pursue results that are to their best
side. On the other side, it will help customers lacking
interests. An experimental design is used to collect intelligent knowledge of negotiation procedures and negotiation skills.
data that can be used in conducting the negotiation process. Such The success of e-negotiation in B2C commerce depends on
approach will improve quality of negotiation decisions for both volume of provided data and information, and how they are
parties. used to optimize the negotiation operations. The size of data is
big enough to extract huge volumes of valuable knowledge that
Keywords—negotiations; e-commerce; agent technology; big
may determine firm‘s success or failure [2]. Using big data
data; analytics
analytics a seller may learn to predict the buyer‘s negotiation
I. INTRODUCTION strategy and develop and adopt optimal tactics to achieve
results that are to his best interests. The ability to manage and
Affordability of smart mobile devices with permanent transform data into useful information and utilize it as a
connection, social networks and real-time conversation streams strategic differentiator is a key contributor to the success of
significantly changed B2C e-commerce. If some time ago we B2C negotiation. The B2C negotiation process must be
have been talking about negotiations, when negotiating parties designed to take advantage of large volumes of consumer data
had little or no knowledge of attributes and their values, now that have become available in recent years due to the Internet,
such information can be retrieved from multiple sources online. social networking, mobile telephony applications, RFID and
Negotiation is one of the major components of many e- sensor applications, and new technologies that create and
commerce activities, such as auctions, scheduling, contracting, capture data, size of which is growing exponentially. Collected
and so on, and is one area that can greatly benefit from data are mainly unstructured and contain valuable customers‘
intelligent automation. We consider negotiations as a form of opinion and behavioral information. Big data analytics can be
interaction between parties with conflicting goals who wish to defined as integrated technologies, techniques, practices,
cooperate in order to reach an agreement that will benefit all methodologies, and applications that analyze critical business
negotiating parties, a process that can be both complicated and data to help an organization better understand its business and
time-consuming. make real time decisions [3].
E-commerce negotiation is a decision-making process that Despite large number of articles in this area, there has not
seeks to find an electronic agreement, which will satisfy the been enough academic research on effective ways to leverage
requirements of two or more parties in presence of limited the big data to create meaningful information for e-commerce
information and conflicting preferences [1]. In e-commerce negotiations. Proposed model allows negotiators to engage
negotiations buyers and sellers search for possible solutions simultaneously in multi-parties‘ negotiations. This agent-based
until agreement is reached or negotiations fail. Both buyers and e-negotiation system has incorporated big data analytics
sellers can conduct their own utility assessment for every technologies to carry out goal-driven multi-parties‘
solution. The goal of negotiation is to seek a solution that negotiations on several issues at the same time and support
optimizes utility value for both of them. vital negotiation mechanism.
Due to recent technological advances mentioned above all Using big data analytics, the seller agent (SA) will be able
organizations involved in B2C commerce are forced to to predict the price a customer has in mind and find out what‘s
improve existing and develop new services to retain old included in other companies‘ offers in order to negotiate from a

30 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 5, No. 12, 2014

position of strength. Agents can do their research within a parties. Heavy-weight calculations are performed on a fixed
given price range and estimate the profit a business will gain. network. Kattan at el. [8] studied an agent-based model for
SA will accurately predict profitability based on different negotiation using genetic algorithms to investigate outcomes of
variables. These variables include original price, available negotiation. An agent-based negotiation agent system based on
quantity, delivery time and other attributes. Based on that data artificial neural networks to generate counter-offers, proposed
SA will derive the best initial asking price and the walk-away in [9], exploited trading experiences containing negotiators‘
price on the spot in order to maximize profit. preferences. Bala [10] and Rajesh [11] proposed a CBR-based
distributed multi-agent electronic negotiation system where
In this work framework architecture for the e-commerce previous similar cases were retrieved from case base, revised
negotiation application is developed as one of the services that and used to develop new offers/counter-offers, to aid in
can be provided by B2C organizations in order to achieve selecting an appropriate negotiation strategy.
greater online and mobile customers‘ satisfaction. To achieve
this goal the author is integrating intelligent agent technology Ronglong at el. [12] proposed an agent based negotiation
with big data analytics in an intelligent negotiation model. model that employed Bayesian learning method. Bjelica and
Petrović in [13] built three-party QoS negotiation model for the
This paper is structured as follows. In section 2 the related future mobile networks, but in the negotiation procedure they
research works in e-negotiation are represented. Framework of proposed user accepts the first "good" service offer and
the proposed mobile negotiation system is given in section 3. possibly missing the "best" one. Fu and Nie [14] developed an
Section 4 concludes this paper and gives general direction of improved PSO (particle swarm optimization) algorithm which
future research. automatically compute an optimal solution to maximize both
II. RELATED WORK buyers‘ and sellers‘ payoff. Bruns and Cortes [15] used
negotiation strategy defined in terms of sub-negotiations with
More and more business processes become electronic. This internal or external agents in their hierarchical model of service
quickly became a part of our life and does not surprise us negotiation, where complex negotiation strategy was
anymore. Anyone can see the advantages of e-commerce. It decomposed into manageable components having well-defined
simplifies our life, changes the whole concept of business. scope.
Some areas of business though are still resistant to changes due
to their specifics. Li and Zhong [16] performed analysis of the negotiation
protocol, negotiation strategy, negotiation flow and negotiation
Majority of the business negotiations represent one of such evaluation that allowed them develop a new mobile commerce
areas. Traditional or partially automated cannot meet the needs negotiation model implementing new negotiation algorithms
of increasingly frequent electronic trading. Automated business and new negotiation evaluation methods. Multi-strategy
negotiation process will improve efficiency of e-Commerce, selection model capable of handling dynamically changing
minimize costs and promote its further development. negotiation situations was developed by Cao and Dai [17] and
During last two decades negotiations have been studied Hindriks at el. [18]. A trusted negotiation broker framework for
extensively. One of the most commonly applied in e-commerce adaptive intelligent bilateral bargaining with well defined
negotiations methods is artificial intelligence (AI). Different AI mathematical models to map business-level requirements and
approaches have been developed and deployed for research, an algorithm for adapting the decision functions during an
training and other purposes based on such methods as game ongoing negotiation have been designed by Zulkernine and
theory, Bayesian networks, evolutionary computation, and Martin [19].
distributed artificial intelligence models. Most of the earlier
III. B2C E-COMMERCE NEGOTIATION
negotiation models have been built under fixed and often
mismatched assumptions and thus inappropriate for the real- Negotiation occurs when two or more counterparts are
life electronic negotiations, based on complex computations, trying to accomplish a deal that satisfies all participating
require high computational power and large memory especially parties. It is a decentralized decision-making process of
when multiple attributes were involved. Several online achieving a compromise in presence of incomplete information
negotiation applications have been developed and and contradictory preferences. As e-commerce environment
implemented. The majority of these applications were one-site provides access to much larger community of buyers and
Negotiation Support Systems and required human sellers, the possibility of better deals emerges for all
participation. participating parties, both businesses and customers.
Various existing e-market places employed e-negotiation Proposed agent-based negotiation model utilizes big data
applications based on intelligent agent technologies. analytics techniques to identify the best initial offer and adopt
Unfortunately, market agents are trading only by price [1, 4], multiple criteria decision in the utility function to evaluate
while in real world, negotiations are conducted not only by offers.
price, but often involve multiple issues (e.g., price, quantity, The following assumptions have been made:
product quality) [5, 6]. Matos and Madeira in their work [7]
propose an automated negotiation model between two  Messages exchanged between two parties to convey
participants for m-commerce, which is using mobile agents and offers/concessions.
considering the mobile device personalization through the use
of profiles in the negotiation. Buying and selling agents are  Messages are encrypted to protect the privacy of
conducting price negotiation on behalf of the negotiating negotiating parties.

31 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 5, No. 12, 2014

 All negotiation activities must be conducted S-agents are representing seller's interests. M-agent is
electronically which allows for transparency of the usually located on a desktop of seller and buyer devices or
negotiation process. server and acts as a mediator between S-agents (seller) and B-
agents (buyer).
 Negotiation applications can be used in online/offline
mode. An agent framework and a development environment are
needed in order to build the proposed system. Infrastructure
 In case of disconnection execution of the application supports the interactions between agents that might be
resumes at the same point. geographically dispersed on the Internet. The architecture has a
Goals of the proposed system include helping negotiating 3-tier structure: the buyers‘ mobile and fixed devices, a
parties to derive initial asking price, concessions, developing mediator, and seller negotiation systems. The first tier is a
efficient strategies, minimizing mistakes, etc., especially for buyer (with wireless or fixed) device, equipped with interface
those lacking the knowledge of negotiation processes. The and intelligent mobile agents, installed on the buyer devices, to
proposed system employs software agents to resolve these help communicate with the system and act as personal
issues [20]. assistants to the buyers. Wireless buyers' access to the E-
commerce services is facilitated by a mediator employing
Buyers‘ data, such as price, quality, delivery time, etc., are multiple mobile agents that search for potential sellers. The
entered by means of an interface agent and stored in buyer's final decision is usually made by a buyer and is based on
profile on a mediator site. Applicable constraints on such recommended offers delivered by the buyer‘s negotiation
attributes may be included at that point. Buyers' request is agent.
carried by mobile agents to a mediator at a fixed location.
Missing information on some attributes can be retrieved from The second tier consists of a mediator site whose functions
DW by a mediator. are:

A negotiation process is considered a combination of one-  Collecting buyer's data from client‘s agent;
attribute negotiation processes. Negotiable attributes may differ  Filling in buyer's profile;
from one buyer to another and may include all or some of the
following: price, qualities, delivery time, guarantee period,  Generating an offer;
specific constraints, and other important to a buyer features. It
is assumed that all attributes are negotiable.  Generating mobile agents on behalf of each mobile
client;
Exchange of offers and counteroffers is an iterative process
that step by step leads to a compromise acceptable to both  Evaluating incoming offers, selecting the best and
negotiating parties. Private information of both parties, such as continue negotiation with the seller that made this best
negotiation strategies, negotiable attributes‘ constraints are offer.
hidden and must not be disclosed. Opponents‘ negotiation  Content adaptation.
strategies can deducted from a sequence of their concessions.
The intermediary server is controlling the adaptation
Big data analytics system is used to derive an initial offer. process to meet the user preferences and supports mobile
Negotiator agent (NA) delivers the generated offer to other devices with different capabilities and limitations, and diverse
participating parties. As an offer is received by a negotiating wireless technologies used by users. It is in charge of content
party, it‘s evaluated, a counteroffer is generated, send back, delivery.
and so on, until negotiation either succeeded or failed. If the
mobile agent of the buyer accepts the price offered by the seller Fixed Buyer
Seller 1
mobile agent, then the negotiation process is completed. Then Interface B-Agent
agents return to the place of origin, where data are evaluated,
Mobile B-Agent
the best counteroffer selected and delivered to prospective
Seller 2
buyer in a suitable form. If the initiator of negotiation accepts
it, the negotiations are concluded. If not, the user will have two Mediator
options: either quit the negotiation or start the new process with M-Agent Server
re-adjusted attributes.
Presentation Agent
IV. E-COMMERCE NEGOTIATION ARCHITECTURE
Negotiator Agent
Negotiation system made of several agents: interface agent,
agent server, presentation agent, buyer and mediator mobile
agents (both fixed and wireless), seller and buyer negotiation Wireless Buyer
agents. Functions of each agent are described below.
Interface WB-Agent Seller N
The design of the e-negotiation system, needed to assist
buyers in searching for prospective counterparts, acceptable Mobile B-Agent
offers, negotiating terms, and finalizing deals, is based on the
architecture developed in our previous works [20, 21] and
shown on Fig. 1. Fig. 1. Negotiation Framework Architecture

32 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 5, No. 12, 2014

The server holds stationary agents (administrator and


Seller Agent
presentation agents), user profiles, and device specifications
database. A user profile is created when a user requests a Sources of Data:
negotiation service for the first time and contains the client‘s Analytics Tools
device specifications. A presentation agent specifies which Web mining
presentation type is most appropriate to the user according to Social networks
predefined set of rules. Such a way, each buyer can receive an Internal and external DB
KB DW
adaptive content that meet his preferences and is compatible Other sources
with his mobile device and wireless technology used.
The third tier is a seller negotiation system whose functions Fig. 2. Seller Negotiation System
are:
 Assigning weight of each negotiation attribute; Seller negotiation system retrieves and analyzes big data to
generate an advice on offer‘s calculations. The negotiating
 Selecting of the concession strategy to be used; agents‘ behaviors are built on these analytical results. Each
agent has inference mechanism based on the rule-base system
 Evaluating of the buyer‘s offer; located in the seller‗s knowledge base. For more complex
 Creating of a counter-offer. knowledge processing, powerful analytics tools may be used.
The agent‘s cooperation helps to detect various offer
Mobile agents are used as means of communication conditions which, in turn, assist decision makers in their
between the tiers, and distribution of the recourses is managed negotiation process. We can improve the negotiation process
across this system architecture. by applying a methodology propose by Lee and Hsu [26] to
A. Seller Module predict the negotiation strategy used by buyer through the
calculation of the relative concession rate.
Seller Negotiation module consists of Knowledge Base
(KB) that specifies set of rules to derive an advice for B. Buyer Module
negotiators, different negotiation mechanisms, and Big Data Buyer module houses interface and buyers‘ mobile agents.
system consisting of data warehouse of e-market data and
analytic tools integrating text mining (e.g., information 1) Interface Agent (IA): A buyer fills necessary
extraction, topic identification, question-answering), web information with an interface agent. This information will be
mining, social network analytics, and existing database. Fig. 2 stored in the buyer‘s profile and contains such data as price,
illustrates how big data and business analytics can be used to quality, delivery time, guarantee period, etc. In the case if a
support negotiation. These tools are used to analyze all types of user has no information on some attributes, he will have an
marketing data using sophisticated quantitative methods such option to perform the search by himself, or else he can choose
as data mining, statistics, predictions, forecasting,
to delegate this job to an agent server. Recorded preferences
visualization, and optimization.
are delivered to a mediator by a buyer‘s mobile agent.
Customers also have access to these sources of data, thus 2) Buyer’s Mobile Agent: is representing the buyers‘
businesses have a unique opportunity to influence customers‘ interests and delivers buyer‘s negotiation initiation request to
opinions and behavior, understand the likelihood of a the agent server, where such a request will be processed. Note
customers‘ willingness to spend money in a certain product that buyers can be located at a fixed location or be on a move.
category, optimize price for better profitability, and increase
competitive edge of organization over competitors [22 - 25]. C. Mediator Module
Using big data analytics a seller may learn to predict the The architecture of Mediator Negotiation system has the
buyer‘s negotiation strategy and therefore adopt optimal tactics following components:
to attain results that are to his best interests. Information on
past selling instances is stored in the data warehouse, and the 1) Agent Server (AS): distributed, intelligent. Its roles
classification analytics tool will select an instance that has the include provision of a standard interface to other agents,
highest similarity with current selling situation. Once such managing resources to satisfy requests of the buyer‘s agent,
instance is identified, a price offer can be made based on the etc. An agent server performs the following main tasks:
price information attached to the selected instance.  Creating and maintaining of an execution environment
Content of DW system includes data on specific and protection and regulatory mechanisms for agents;
negotiation circumstances, negotiating parties‘ profiles, result  Facilitating migration of agents‘ code;
of negotiations (success, failure, and terms of the reached
agreements), negotiation strategy, etc., in other words any  Monitoring agents‘ actions;
relevant information that can be used to derive a sequence of
concessions made by both negotiating parties, and so on.  Allowing co-existence of and communications between
agents working on the same negotiation;

33 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 5, No. 12, 2014

 Prohibit direct interference and, in general, any kind of V. PROPOSED NEGOTIATION MODEL
communications between agents of different buyers in Negotiation issues (attributes: i=1, 2, …, n) to be agreed on
order to avoid sharing confidential information on by both buyer and seller, which are the decision objects that the
negotiation strategies, constraints, negotiation status, negotiation agents are using to negotiate. Each attribute (i)
etc; have three different values: for a seller a maximum value
 Handling communications with other servers and access (Aimax) which is the asking or starting point, a lowest
available services through them. acceptable value (Aimin) and the best expectation value (Ais) of
the negotiation; for a buyer a highest acceptable value (Aimax),
2) Presentation Agent (PA): With the emergence of best expectation value (Aib) of the negotiation and a minimum
heterogeneous devices, content adaptation became value (Aimin) which is the starting point. The early prediction of
unavoidable. Its main goal is to enable the presentation of the values of these supportable solutions variables discovered
digital content of on different mobile devices. The device and captured from the analytics of big data which depends on
context is information that is used to characterize a user‘s context and situation. These attributes‘ values will help in
mobile device. It includes some of the main parameters that calculation of relative concession rates. Attributes of the same
values can‘t be negotiated. Each attribute is associated with a
characterize mobile devices, such as device type and device
weight (wi) which reflects the importance of the negotiation
screen resolution. Nowadays mobile devices can be connected attribute. Both buyer and seller decide the weight of each
to the Internet via different wireless technologies. Each has a attribute according to their preferences of each negotiation
different data transfer rate. attribute.
 As a result, we have to specify the type of wireless
technology that will be used by the user to connect his Description of the Fact-Based E-negotiation model:
device to the Internet. The layout structure, image size, initially, buyer and seller assign the weight of each negotiation
and font size, may not be compatible to present on attribute and choose the concession strategy (anxious, careful,
portable device. So, a presentation agent dynamically or greedy type [27]), and submit them to their negotiation
creates new images based on original. Using different agents. Both concession strategies and attribute weights of each
media conversion tools for text, image this agent side are unknown to the other side. The values of negotiation
develops a new content based on the device attributes are delivered to the relevant opponent agent. The
characteristics recognition such as mobile device type objective of e-negotiation is to maximize utility function and
(Notebook computers, PDAs, smart phones or cell the worst case should not make the utility function value lower
phones); types of the operating system (Apple OS X, than a predefined one. Otherwise the negotiation process
Blackberry OS, Windows Mobile, Palm OS, etc.); type should be terminated. In every negotiation round, the SA will
of format; web browsers; network type; upload and estimate the buyer's intention and forecast his acceptance
download speed of the mobile device. probability. The seller agent must calculate its own evaluation
function, and then determine its actions and refresh its
3) Mediator’s Mobile Agent: can move from one system to parameters for the next round. In each negotiation round, the
another. Mobile agents are generated dynamically during the negotiation agent (either buyer‘s or seller‘s) receives an
execution. They can reconfigure themselves dynamically based opponent‘s offer and checks if it is within its expectation, then
on changes of the services. makes a decision whether to accept, reject or continue the
An offer, which will be presented to the negotiation system, negotiation. In case of continuing the process, one side changes
is built based on user‘s preferences accepted by an interface its bid to show a motivation to compromise, and continues
agent and consequently passed to an agent server. The agent negotiation with the other side. The latter evaluates the
server creates mobile negotiator agents whose job is to carry an proposal of the opponent, and decides whether to accept it or
offers to prospective buyers. A negotiator agent above all not. If the opponent rejects the proposal, he adjusts the attribute
contains an offer to be delivered to counterparts, and an value, generates counter-proposal, and returns it to the bidder.
address, explicitly specified by a client or provided through The process continues until the attribute values reach a balance
search. Each agent engages in bilateral negotiations, exchanges where both sides accept the proposal, or one or both side(s)
offers/counter-offers with other party, evaluates counter-offers, reached their least acceptable limit, and therefore the
and so on, until either preliminary agreement has been reached negotiation is failed.
or negotiations have failed. In both cases a negotiator agent In order to measure the merits of the negotiation proposal,
returns back to the mediator informing it about the results. The it is needed to calculate the value of the current proposal's
agent can make a better decision when it learns more about its utility. Utility function is given below.
counterpart. However, the reasoning strategy of the agent may
be changing with accumulating knowledge as the negotiation In each round the negotiation seller agent calculates the
goes on. The best outcome is selected and presented to the total utility (Tsu) value:
buyer. If the buyer accepts the final agreement then it finalized,
thus negotiation process is considered completed. If not, then  ∑  
negotiation is considered a failure.

34 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 5, No. 12, 2014

where: wi is the weight of each attribute; Cis is the seller VI. EXPERIMENT
concession rate between two consecutive negotiation rounds (t) Simulation prototype was developed using Java Agent
and (t-1) of attribute (i). Development Framework as the platform to simulate the actual
operation of multi-agent negotiation. Based on the definitions
 ( )   of the proposed negotiation model the data as shown in Table I
are used to test the model.
where: are current (t) and previous (t-1)
offers for negotiation attribute (i) respectably. TABLE I. ATTRIBUTE VALUES

 ( )   Price P
($$)
Quantity
Q
Time to Deliver
TD (days)
User
where: Ai,t is the value of attribute (i) at round (t); is Start Max Start Max Start Max
the attribute value on the previous round, and is lower Buyer 500.00 780.00 10 15 7 14
limit not to be exceeded.
Start Min Start Min Start Min
   Seller1 1000.00 750.00 20 13 21 13

The seller utility evaluation function evaluates the value of Seller2 1200.00 732.76 5 5 18 10
each negotiation attribute (i) in each negotiation round (t). At Seller3 1300.00 1000.00 20 10 14 7
the beginning of a negotiation utility function is set to its
maximum value which usually equals to 1. When negotiation In this simulation example, there is one buyer's agent B
time reaches deadline, the target utility should be decreased to negotiates with three sellers, agents: S1, S2, and S3 separately.
the least acceptable value that seller agent can accept. And there is one item for negotiation. The attributes of the item
At the buyer side, the negotiation buyer negotiation agent are price, quantity and delivery date. In the first round the
calculates the total utility (Tbu) represents the maximum level agents of the sellers are initialized according to the
the buyer is willing to pay for related attributes or minimum recommendations from their business analytics systems as
level the buyer wish to accomplish for important related shown in Table I. In this round both buyer and sellers provide
attributes. the weight of each attribute according to their preferences of
each negotiation attribute. In this example we consider the
 ∑   scenario in which sellers using business analytics while buyer
does not. In case if the buyer provided help from Web-based
where: wi is the weight of each attribute; Cbi, is the buyer‘s negotiation support system, he will be in better negotiation
concession rate between two consecutive negotiation rounds (t) position. In each round buyer takes advantage of the
and (t-1) of attribute (i). information receiving from negotiations with sellers to adjust
his offer and counteroffers. Buyer hides that negotiation
information of each seller from the other sellers which gives
 ( )   him more negotiation power.

 ( )   At the end of round one, sellers will not accept the initial
proposal of the buyer, and further negotiation is needed. Based
on the weight of each attribute, buyer and sellers adjust the
  
proposal values of price, quantity, and delivery time. The
proposal values of each negotiation round are shown in Table
In case if the buyer conducting at the same time negotiation II.
with a number of sellers to buy the same items then buyer will
adjusts his offer based on the overall information receiving After 11 rounds of negotiations, the buyer accepts the
from all sellers agents. proposal of the seller S2 because of best acceptable price
compared to that of S1 and S3 sellers (739.76 compared with
If the seller agent accepts the counter-offer, then the deal is 748.11for S1 and failure for S3 ) while quantity and time of
completed. If rejected, then the buyer agent may adjust the delivery attributes have similar values. The negotiations with
offer by decreasing its goal utility for next round of negotiation other sellers are terminated.
until the process is completed with an agreed deal or failure. In
case if a viable buyer is not willing to agree to bottom line Graphical representation of the results of negotiations on
(best expectation value or least acceptable value) then a course price can be seen correspondently on Fig. 3, on quantity on
of action of switching strategy is recommended. Fig. 4 and on delivery time on Fig. 5.

35 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 5, No. 12, 2014

Price Negotiations
1400.00

1200.00
PB
1000.00
PS1
800.00
PS2
PS3 600.00

400.00

200.00
11 9 7 5 3 1

Fig. 3. Price Negotiation Process

TABLE II. NEGOTIATION PROCESS WITHOUT BUYER'S ANALYTIC TOOLS

Buyer Seller1 Seller2 Seller3


Round P Q TD P Q TD P Q TD P Q TD
0 500.00 10.00 7.00 1000.00 20.00 21.00 1200.00 5.00 18.00 1300.00 20.00 14.00
1 501.00 11.00 9.00 999.00 19.00 19.00 1198.60 5.50 16.43 1294.67 19.00 13.00
2 519.63 12.45 11.22 980.37 17.55 16.78 1172.50 6.50 14.78 1284.11 17.55 12.11
3 541.96 13.68 12.71 958.04 16.32 15.29 1140.86 7.93 13.83 1269.39 16.32 11.87
4 579.93 14.45 13.52 920.07 15.55 14.48 1086.21 9.61 13.47 1251.50 15.55 12.14
5 624.69 14.83 13.88 875.31 15.17 14.12 1019.58 11.29 13.49 1232.20 15.17 12.65
6 671.25 14.97 13.98 828.75 15.03 14.02 946.21 12.72 13.66 1212.75 15.03 13.18
7 710.88 15.00 14.00 789.12 15.00 14.00 877.03 13.77 13.82 1193.92 15.00 13.58
8 739.96 15.00 14.00 760.04 15.00 14.00 815.29 14.43 13.92 1175.80 15.00 13.82
9 751.89 15.00 14.00 748.11 15.00 14.00 770.54 14.77 13.97 1158.13 15.00 13.94
10 745.66 15.00 14.00 754.34 15.00 14.00 739.76 14.92 13.99 1140.12 15.00 13.98
11 714.73 15.00 14.00 785.27 15.00 14.00 718.77 14.98 14.00 1120.73 15.00 14.00
12 657.62 15.00 14.00 842.38 15.00 14.00 722.04 15.00 14.00 1098.01 15.00 14.00
13 566.21 15.00 14.00 933.79 15.00 14.00 753.91 15.00 14.00 1068.99 15.00 14.00
14 435.44 15.00 14.00 1064.56 15.00 14.00 820.69 15.00 14.00 1027.55 15.00 14.00
15 247.77 15.00 14.00 1252.23 15.00 14.00 935.62 15.00 14.00 959.56 15.00 14.00
16 -53.23 15.00 14.00 1553.23 15.00 14.00 1141.74 15.00 14.00 806.34 15.00 14.00

Quantity Negotiations Delivery Time Negotiations


25.00
25.00
QB 20.00
20.00
TDB
QS1 15.00 15.00
TDS1
QS2
10.00 10.00
TDS2
QS3
5.00 5.00
TDS3
0.00
0.00 11 9 7 5 3 1
11 9 7 5 3 1

Fig. 4. Quantity Negotiation Process Fig. 5. Delivery Time Negotiation Process

36 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 5, No. 12, 2014

VII. CONCLUSIONS AND FUTURE WORK [10] M. I. Bala, S. Vij and D. Mukhopadhyay, ―Intelligent agent for
prediction in e-negotiation: an approach‖, Int. Conf. on Cloud &
In this work a description of B2C e-commerce negotiation Ubiquitous Computing & Emerging Technologies, 2013, pp. 183 – 187.
model is presented. The primary job of this model is to conduct [11] Rajesh and Shiv Kr. Tayal, ―An intelligent multi agent framework for e-
negotiations on behalf prospective buyers and sellers commerce using case based reasoning and argumentation for
representatives. It employs multiple software agents that negotiation‖. Int. J. of Business & Man. Research, vol. 2 (6), 2012, pp.
232-243.
represent specific functional of the system and applies big data
analytics. Based on analytics results, agents are able to improve [12] K. Ronglong and J. Tongqiang, ―E-commerce automated negotiation
model research based on multi-agent‖, IHMSC, 2013 5th Int.. Conf. on
their behaviors over time and take proactive and reactive Intelligent Human-Machine Systems and Cybernetics, vol. 1, 2013.
negotiation actions. From that analytics knowledge, they may [13] M. Bjelica and Z. Petrović, ―A framework for QoS negotiation in future
get better with selecting and achieving goals and taking correct mobile networks‖, WSEAS Trans.s on Com., 2004.
actions. [14] K. Fu and G. Nie, ―Application of particle swarm optimization
algorithm in e-commerce negotiation‖, The 6th Wuhan Int. Conf. on E-
The system provides the customizable user interface. Business - e-Business Track, 2001.
Information filled in by the buyer will be stored in the buyer‘s [15] G. Bruns, and M. Cortes, ―A hierarchical approach to service
profile and used for generation of the original offer. negotiation‖, 2011 IEEE Int. Conf. on Web Services (ICWS), 2011,
Negotiations are conducted by multiple negotiator agents with Washington, DC, pp. 460 – 467.
several organizations in parallel to speed up the negotiation [16] P. Li and Y. Zhong, ―A new multiple-attribute negotiation model of
process; the best counter-offer is selected by the agent server mobile e-commerce‖. ISECS '09 Proc. of the 2009 2nd Int. Symp. on
and presented to the buyer. Electronic Commerce and Security, Publisher: IEEE Comp. Soc., 2009,
vol. 2, pp. 14-18.
Our future research will be concentrated on developing a [17] M. Cao, and X. Da, ―Multi-strategy selection model for automated
secure fact based e-commerce negotiation agent-based system. negotiation‖, 47th Hawaii Int. Conf. on System Science, 2014.
[18] K. Hindriks, C. M. Jonker and D. Tykhonov, ―Let‘s dance! An analytic
REFERENCES framework of negotiation dynamics and strategies‖, Web Intelligence
[1] P. Braun, J. Brzostowski, G. Kersten, J. B. Kim, R. Kowalczyk, et al., and Agent Systems: An Int. J. vol. 9, 2011, pp. 319–335.
"Intelligent decision-making support systems: foundations, applications [19] F. H. Zulkernine and P. Martin, ―An adaptive and intelligent SLA
and challenges", Springer, London, 2006, pp. 271-300.
negotiation system for web services‖. IEEE Trans. on Services
[2] A. Rajpurohit, ―Big data for business managers — Bridging the gap Computing, vol. 4(1), pp. 31- 43.
between potential and value Big Data‖, 2013 IEEE International [20] I. Serguievskaia, H. Al-Sakran, and J. Atoum, ―A multi-agent
Conference on Digital Object, 2013 , pp. 29-31.
experience based e-negotiation system‖, Inf. and Com. Technologies,
[3] H. Chen, R.H.L. Chiang, and V.C. Storey, ―Business intelligence and ICTTA‘06, 2006.
analytics: from big data to big impact‖, MIS Quarterly, vol. 36(4), 2012, [21] H. Al-Sakran, M. Alsudairi, and I. Serguievskaia, ―Mobile e-loan
pp. 1165-1188.
negotiation architecture‖, J. of Internet Banking and Commerce, 2007,
[4] K. D. Ahmadi, N. M. Charkari and N. Enami, ―E-negotiation system vol. 12(2), (http://www.arraydev.com/commerce/jibc/).
based on intelligent agents in B2C e-commerce, advances on [22] T.H. Davenport, L.D. Mule and J. LuckerKnow, ―analytics know what
information sciences and service sciences‖, vol. 3(2), March 2011.
your customers want before they do‖, Harvard Business Review.
[5] L. Liu, Y. Ma, and C. Wen, ―Multilateral multi-issue automated December 2011.
negotiation model based on GA‖, Int. Conf. on Intelligent Computation [23] R.S. Sathyanarayanan, ―Customer analytics –the genie is in the Detail, J.
Technology and Automation, 2008, pp.85-89. of Marketing & Communication, vol. 7(3), 2012.
[6] K. C. Lee, H. Lee and N. Lee, ―Agent based mobile negotiation for [24] D. S. Putlerand and R. E. Krider, ―Customer and business analytics:
personalized pricing of last minute theatre tickets‖, Expert Systems with applied data mining for business decision making using R‖, Int. Stat.l
Applications, vol. 39, 2012, pp. 9255–9263. Review, Aug, 2013, pp. 328-328.
[7] F. Matos and E. Madeira, ―An automated negotiation model for m- [25] Earley, ―Big data and predictive analytics: what's new? IT S.
commerce using mobile agents‖, ICWE'03 Pr. of the 2003 Int. Conf. on Professional, IEEE Computer Society, 2014, vol. 16(1), pp. 13 – 15.
Web engineering, , Springer-Verlag, 2003, pp. 72-75.
[26] W.-M. Lee, and C.-C. Hsu, ―An intelligent negotiation strategy
[8] A. Kattan, Y.-S. Ong, and E.G. an-L´opez, ―Multi-agent multi-issue prediction system‖, In Proc. of the 9th Int. Conf. on Machine Learning
negotiations with incmplete information: a genetic algorithm based on and Cybernetics, Qingdao, 11-14 July 2010.
discrete surrogate approach‖, IEEE Congress on Evolutionary
Computation, 2013. [27] R. H. Guttman, and P. Maes, ―Cooperative vs. competitive multi-agent
negotiations in retail electronic commerce‖. In Proc. of the 2nd Int.
[9] S. J. Hussain and M. Kumar, ―Meta architecture to support layered Workshop on Cooperative Information Agents, vol. 1435, pp.135-147.
banking systems ‗ In Proc. of ICET‘06, Int. Conf. on Emerging
Technologies, vol. 13-14, pp.754 – 760.

37 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 5, No. 12, 2014

A Study of MCA Learning Algorithm for Incident


Signals Estimation
Rashid Ahmed John N. Avaritsiotis
National Technical University of Athens National Technical University of Athens
School of Electrical and Computer Engineering School of Electrical and Computer Engineering
9, Iroon Polytechniou St., 15773 Athens, Greece 9, Iroon Polytechniou St., 15773 Athens, Greece

Abstract—Many signal subspace-based approaches have Mitsuharu M. et al in paper [5]: have introduced the
already been proposed for determining the fixed Direction of multiple signal classification (MUSIC) method that utilizes
Arrival (DOA) of plane waves impinging on an array of sensors. the transfer characteristics of microphones located at the same
Two procedures for DOA estimation based neural network are place, namely aggregated microphones. The conventional
presented. Firstly, Principal Component Analysis (PCA) is microphone array realizes a sound localization system
employed to extract the maximum eigenvalue and eigenvector according to the differences in the arrival time, phase shift and
from signal subspace to estimate DOA. Secondly, Minor the level of the sound wave among each microphone.
component analysis (MCA) is a statistical method of extracting Therefore, it is difficult to miniaturize the microphone array.
the eigenvector associated with the smallest eigenvalue of the
covariance matrix. In this paper, we will modify a MCA learning Gao F. et al in paper [6]: have introduced a new spectral
algorithm to enhance the Convergence, where a Convergence is search-based direction-of-arrival (DOA) estimation method is
essential for MCA algorithm towards practical applications. The proposed that extends the idea of the conventional ESPRIT
learning rate parameter is also presented, which ensures fast DOA estimator to a much more general class of array
convergence of the algorithm, because it has direct effect on the geometries than assumed by the conventional ESPRIT
convergence of the weight vector and the error level is affected technique.
by this value. MCA is performed to determine the estimated
DOA. Simulation results will be furnished to illustrate the In the context of DOA, the minor component is the
theoretical results achieved. direction in which the data have the smallest variance.
Although eigenvalue decomposition or singular value
Index Terms—Direction of Arrival; Neural networks; Principle decomposition can be used to extract minor component, these
Component Analysis; Minor Component Analysis traditional matrix algebraic approaches are usually unsuitable
for high-dimensional online input data. Neural networks can
I. INTRODUCTION
be used to solve the task of MCA learning algorithm [7].
Neural networks have seen an explosion of interest over Other classical methods involve costly matrix inversions, as
the last few years and are being successfully applied across an well as poor estimation performance when the signal to noise
extraordinary range of problem domains, in areas as diverse as ratio and number of samples are small and too large,
finance, medicine, engineering, geology, physics and biology. respectively [8].
The excitement stems from the fact that these networks are
attempts to model the capabilities of the human brain. From a In many practical applications, a PCA algorithm
statistical perspective neural networks are interesting because deteriorates with decreasing signal to noise ratio and it may
of their potential use in prediction and classification problems diverge in some cases to the learning rate giving incorrect
[1,2,3]. A neural network is an information–processing system results[9].For this reason, we need to handle this situation in
that has certain performance characteristics in common with order to overcome the divergence problem. In this context, we
biological neural networks. Many methods for the estimation present a MCA(R) learning algorithm that has a low
of the Direction of Arrival (DOA) have been proposed. computational complexity. This allows the algorithm to update
quickly (converge) to extract the smallest eigenvalue and
Dovid Levin et al in paper [4]: have explored the problem eigenvector, which can be used to estimate DOA.
of SRP maximization with respect to a vector-sensor can be
solved with a computationally inexpensive algorithm. A The paper is organized as follows. In Section II, we
maximum likelihood (ML) DOA estimator is derived and sub discuss the array signal model and we also describe a
subsequently shown to be a special case of DOA estimation by theoretical review of some existing Principal Component
means of a search for the direction of maximum steered Analysis (PCA) and Minor Component Analysis (MCA)
response power (SRP). The ML estimator achieves asymptotic algorithms. In Section III, firstly we present the model for
efficiency and thus outperforms existing estimators with DOA measurements and then modified MCA algorithm is
respect to the mean square angular error (MSAE) measure. introduced. Finally in the same section, a convergence is
The beampattern associated with the ML estimator is shown to analyzed. Simulations of results are included in Section IV to
be identical to that used by the minimum power distortion evaluate the convergence of the algorithm by comparison with
with less response beamformer for the purpose of signal aforementioned algorithms [10] and we verify our theoretical
enhancement.

38 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 5, No. 12, 2014

findings by comparing the algorithm results with the DOA. Consider the feed forward network shown in Fig.1. The
Finally, conclusions are drawn in Section V. following two assumptions of a structural network are made:
II. SIGNAL MODEL AND LEARNING ALGORITHMS 1) Each neuron in the output layer of the network is
FOR PCA AND MCA linear.
2) The network has m inputs and l output, both of which
A. Signal Model
are specified .Moreover the network has fewer outputs than
Consider an array of omnidirectional sensors. The medium inputs (i.e. l<m).
is assumed to be isotropic and non-dispersive. Since far-field The only aspect of the network that is subject to training is
source targets are assumed, the source wave front scan is the set of synaptic weights connecting source nodes i, in
approximated by plane waves. Then, for narrow band source
the input layer to computation nodes j in the output layer,
signals, we can express the sensor outputs as the sum of the
where .
shifted versions of the source signals.
The output )of neuron j at time, produced in response
Consider a Uniform Linear Array (ULA) of (m)
omnidirectional sensors illuminated by l narrow-band signals to the set of inputs
(l<m) .At the l’th snapshot the output of the i’th sensor may be ) , is given by
described by [11]
) ∑ ) ) )
∑ √ ) ) )
The synaptic weight is adapted in accordance with a
Where is the space between two adjacent sensors, the generalized form of Hebbian learning [12,13] according to
angle of arrival, d signals incident onto the array, PCA as shown by:
normalizes frequency. The incoming waves are assumed
to be planned. The output of array sensors is affected by white
noise which is assumed to be uncorrelated with the incoming ) [ ) ) )∑ ) )] )
signals. In vector notation, the output of the array results from
l complex signals can be written as:
Where ) , is the change applied to the synaptic
) ) ) ) ) weight ) at time, and η is the learning rate parameter,
Where the vectors greater than zero.
) ) are defined as: This principal component analysis algorithm has been
) ) ) found very useful for extracting the most representative low-
) ) ) dimensional subspace from a high–dimensional vector space.
) ) ) It is widely employed to analyze multidimensional input
And )is the matrix of steering vectors, vector of hundreds of different stock prices, however when
is the target DOA parameter vector, used in signal processing this algorithm deteriorates with
) ) ) decreasing signal to noise ratio [12].
Moreover,
) ⁄ )

B. Learning Algorithm for PCA


Consider the linear neural unit described by
) )
Where the input vector, represents the weight
vectors and y denotes the neuron’s output. The unit is used for
extracting the first principal component from the input random
signal, that is )should represent )in the best way, in the
sense that the expectation error should be minimized.
Fig. 1. Oja’s single-layer linear neural network
[ ]
C. Learning Algorithm for MCA
Here denotes mathematical expectation with
The opposite of PCA is Minor Component Analysis
respect to under the hypothesis .The problem may be (MCA), is a statistical method of extracting the eigenvector
expressed as, associated with the smallest eigenvalue of the covariance
matrix of input signals. As an important tool for signal
Solve: * + )
processing and data analysis, MCA has been widely applied

39 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 5, No. 12, 2014

to: total least squares (TLS) [14], clutter cancellation [15], ) ) ) ) ) ) )


curve and surface fitting [16], digital beamforming [17],
bearing estimation [18], etc. One single linear neuron can be For a multiple output (neuron) the output )of neuron
used to extract minor component from input signals adaptively j,is produced in response to the set of input,
and the eigenvector associated with the smallest eigenvalue of )
the covariance matrix is called Minor Component, where one
seeks to find these directions that minimize the projection And is given by,
variance. These directions are the eigendirections
corresponding to the minimum eigenvalue. The applications of ) ∑ ) ) )
MCA arise in total least square and eigenvalue-based spectral
estimation methods [19,20]. It allows the extraction of the first
minor component from a stationary multivariate random The synaptic weight is adapted in accordance with the
process based on the definition of cost function to be generalized form of Hebbian, where the target of MCA is to
minimized under right constraints. The extraction of the least extract the minor component from the input data by updating
principal component is usually referred to as MCA. For first the weight vector )adaptively,
Minor Component, what must be found is the weight vector
for all ) , as,
that minimizes the power * +of neurons output.
For convenience, we produce a cost function for minor ) [ ) ) )∑ ) )] )
component estimation, that the problem is minimizing the cost
function
Where ) , is the change applied to the synaptic
weight ) at time, and Examining Eq.11, the
{ ) )} )
term, ) ) on the right-hand side of the equation is
related to Hebbian learning. As for the second term,
) )

With respect to the weight vector, its gradient has the )∑ ) )


expression,
Is related to a competitive process that goes on among the
⁄ synapses in the network. Simply put, as a result of this
process, the most vigorously growing (i.e., fittest) synapses or
Thus the optimal multiplier may be found by neurons are selected at the expenses of the weaker ones.
vanishing , that is by solving, Indeed, it is this competitive process that alleviates the
exponential growth in Hebbian learning working by itself.
⁄ Note that stabilization of the algorithm through competition
requires the use of a minus sign on the right-hand side of
Now the main point is to recognize that from an Eq.11. The distinctive feature of this algorithm is that it
optimization point of view the above system is equivalent to: operates in a self-organized manner. This is an important
characteristic of the algorithm that befits it for on-line
⁄ ) , learning. The generalized Hebbian Form of Eq.11, for a layer
of neurons includes the algorithm of Eq.9, as
Where , is a constant. It can be proven that the first
minor converges to the expected solution providing that the ) ) )
constantβ is properly chosen. This is the way to compute the Hence that,
optimal multiplier to obtain the stabilized learning rule [16].
The most exploited solution to the aforementioned problems ) ) [ ) ) ) )] )
consists of invoking the discrete–time versions of first minor,
as III. DOA MEASURMENT MODEL AND MCA MODIFIED
ALGORITHM
) ) )
A. DOA Model
Where η, is the learning rate and it’s a common practice to
makeηa sufficiently small value which ensures good This algorithm uses measurements made on the signal
convergence in a reasonably short time that represents the received by an array of sensors. The wave fronts received by
discrete time stochastic counterpart of first minor rules. Neural m sensors array element are linear combination of incident
networks MCA learning algorithms can be used to adaptively waveforms d and noises. The MCA begins with the following
update the weight vector and reach convergence to minor model of the received input data vector which is expressed as:
component of input data. In the first order the linear MCA will
be: [ ] ) )[ ] [ ] )

40 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 5, No. 12, 2014

Where S, is the vector of incident signals, N is the noise Convergence analysis needs to confirm the Eq.17, will
vector and ) is the array steering vector corresponding to converge to the minor component subject to the learning rate.
the DOA of the i'th signal. The received vector Xand the
steering vector )as vector in m dimensional space, the C. Convergence Analysis
input matrix can be expressed [21]: In order to confirm the weight vector we will converge to
minor component of input data in Eq.17 and it is important to
) discuss the learning rate η because it has a direct effect on the
In many practical applications, the smallest eigenvalue of convergence.
the matrix R of input data is usually larger than zero due to the For convenience of analysis, since the matrix is a
noisy signals. The column vectors of steering vectors, is symmetrical nonnegative definite matrix.
perpendicular to the eigenvector corresponding to the noise.
The MCA spectrum may be expressed as, The weight vector has unit length, that is
) ⁄ ) ) ) ‖ ‖
The matrix is a projection matrix onto the noise ‖ )‖ ‖ )‖ ‖ )‖
subspace. For steering vectors that are orthogonal to the noise
‖ )‖ )
subspace, the denominator of Eq.16, will become very small
and thus the peaks will occur in )corresponding to the ) ) ‖ )‖ ‖ )‖
angle of arrival of the signal. Where the ensemble average of
the array input matrix R is known and the noise can be Let us assume that be all the eigenvalues of
considered uncorrelated and identically distributed between matrix , are ordered by,
the elements [22].

TABLE I. A SUMMARY OF DIFFERENT DOA ALGORITHMS Where )is the initial weight vector, is the largest
eigenvalue of the matrix and the eigenvector associated with
Method Power spectral as the smallest eigenvalue of .
function of,
)
Suppose a definition of an invariant set as
1 PCA ) Signal subspace
The convergence analysis shows the learning rate suppose,
2 MCA ) ) Noise subspace
⁄ ‖ ‖ ⁄ ,
Where select ) is the initial weight vector, is the
B. The Modified MCA Algorithm largest eigenvalue of the matrix and the eigenvector
The algorithm is based on MCA learning, which allows to associated with the smallest eigenvalue of , that
update quickly and to extract the smallest eigenvalue and
eigenvector, that can use these values to estimate DOA. The
,
learning rate parameter is also presented, which ensures fast
convergence of the algorithm. For ⁄ , it follows that

To develop insight, the behavior of the GHA can be shown From Eq.17, the condition is satisfied,
as: ̂ ) ) ∑ ) )
In the last section, the weight vector yielded by GHA can ‖ )‖ ∑ ‖ )‖
further be modified by adding to Hebbian rule (where For ‖ ‖ ⁄ ,
the learning rate is often employed as small value) and a ‖ )‖ ‖ )‖
positive value , that is greater than the largest eigenvalue of
matrix . Recall from Section II-C, we can obtain the ⁄
modified MCA algorithm as follows
IV. SIMULATION RESULTS
) )
In this section we describe our simulation results. We will
)[ ̂ ) )) )] compare the convergence of our modified algorithm with
) )[ ̂ ) ) )) ̂ )] aforementioned approaches by choosing a suitable learning
rate where η should satisfy ⁄ , Programs were
By taking as ) ) for the modified MCA written for DOA estimation in Matlab. A general test example
is used for this purpose, with two sources, signal located at the
algorithm, and , ⁄ , that is, far field at ( , ) degree with normalized frequencies of
(0.35,0.36) fs respectively were used. A ULA of five
( ) )) ) )
snapshots (L), eight sensors and sensor spacing equaling half
wave length ( ), spacing was used to collect the data.

41 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 5, No. 12, 2014

A. Simulation1: Effect of varying the learning rate parameter MCA with aforementioned algorithms, that shows high
In this simulation we show that the effect of varying the performance of modified algorithm, where it has better
learning rate parameter has a direct effect on the convergence convergence result than PCA and ordinary MCA algorithms.
of the weight vector. When the learning rate has a large step This is a result of choosing a more suitable learning rate,
size that is shown in Fig.2, it allows the algorithm to update where the learning rate influences the overall rate of
quickly, and may also cause the estimate of the optimum convergence. A smaller learning rate is selected.
solution to wander significantly until the algorithm reaches C. Simulation 3: Studying the performance effectiveness of the
convergence and the error reaches zero. When learning rate modified algorithm
has a small step size that is shown, the convergence will be
painfully slow typically. A small step size may be chosen to In this simulation in order to illustrate the effectiveness of
reduce this wandering until the desired accuracy is achieved the algorithm, we used measured DOA estimation based on
but will require a long time for the algorithm to reach the the modified MCA to show the effectiveness of performance
optimum solution (fittest eigenvalue). Therefore, it should be of this algorithm. It is a fact that we can obtain direction
selected a suitable learning rate in order to prevent learning estimates better than PCA algorithm.
divergence, because this unsuitable value will make the 1) Effect of Changing the Number of Snapshots
algorithm deviate drastically from the normal learning, which  Figures (4,5) show the estimated DOA of incoming
may result in divergence or an increased learning time. signal. It’s apparent that the spectral peaks of modified
MCA multiple sources become better when the number
of snapshots is increased, as shown in Fig.5, when the
number of snapshots is equal to five.
 Figures (6,7)show the estimated DOA of two sources
for incoming signals, with changing number of
snapshot. Also, it is obvious that the spectral peaks of
PCA become sharp and the resolution increases when
the number of snapshots is increased, as shown in
Fig.7, when the number of snapshots is equals five.
2) Effect of added white noise vector
Figures (8,9)show the estimated DOA of two sources for
incoming signals in PCA and modified MCA, respectively, in
order to compare a modified MCA performance with PCA
when the input vector is affected by white noise vector. Fig.9,
Fig. 2. Learning rate step when η= (0.01 and 0.1) shows the modified MCA estimate with right angles, where the
spectral has better accuracy than the PCA spectral plotted as
shown in Fig.8.

Fig. 3. Comparison convergence of algorithms


Fig. 4. Estimation DOA by modified MCA when number of snapshots L< 5
B. Simulation 2: Comparison of methods with regard to
convergence
Fig.3, shows the Comparison convergence of the modified

42 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 5, No. 12, 2014

Fig. 8. Estimation DOA by PCA when additive noise Ν = 0.009 dB

Fig. 5. Estimation DOA by modified MCA when number of snapshots L=5

Fig. 9. Estimation DOA by modified MCA when additive noise Ν = 0.009


dB
Fig. 6. Estimation DOA by PCA when number of snapshots L< 5
V. CONCLUSION
This paper presented a prototype direction of arrival
estimation. During this study, a simple MCA learning
algorithm is presented to extract minor component from input
signals to enhance the convergence. The learning rate
parameter is also presented which ensures fast convergence of
the algorithm. Clearly, this shows that the modified MCA has
quickly converged to the minor component subjected to the
learning rate. In this context, the learning rate usually should
be set at a suitable value to reach the optimum solution and to
move the algorithm too close in the “correct” direction.
Also, this demonstration shows that the modified MCA
algorithm achieves to produce a right angle θ for the DOA,
when the input vector is affected by white noise vector better
Fig. 7. Estimation DOA by PCA when number of snapshots L=5 than the PCA algorithm that fails to produce a value for the
DOA above a certain level of noise.

43 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 5, No. 12, 2014

The main advantage of this algorithm is it can better [13] Kwang In Kim, Matthias O. Franz, Bernhard, “Kernel Hebbian
tolerate noises signals to extract the minimum eigenvalue from Algorithm for Iterative Kernel Principal Component Analysis ”,Max
Planck Institute for Biological Cybernetics, June 2003.
noise subspace and it has been applied to find DOA
[14] K. Gao, M.O. Ahmad, M.N. Swamy, “Learning Algorithm for Total
estimation. Least Squares Adaptive Signal Processing, Electronics Letters. Feb.
1992.
ACKNOWLEDGMENT
[15] S. Barbarossa, E. Daddio, G. Galati, “Comparison of Optimum and
This research is supported by the School of Electrical and Linear Prediction Technique for Clutter Cancellation”, Communications,
Computer Engineering, National Technical University in Radar and Signal Processing, IEE Proceedings, ISSN (0143-7070).
Athens (NTUA), Greece [16] L. Xu, E. Oja, C. Suen, “Modified Hebbian Learning for Curve and
Surface Fitting, Neural Networks, 1992.
REFERENCES [17] J.W. Griffiths, “Adaptive Array Processing, A Tutorial,
[1] Alexander I., Galushkin “Neural Networks Theory”Springer-Verlag Communications, Radar and Signal Processing, IEE Proceedings, ISSN
Berlin Heidlberg, 2007, ISBN 0-387-94162-5 . :0143-7070.
[2] Timo Honkela, Włodzisław Duch.” Artificial Neural Networks and [18] R. Schmidt, Multiple Emitter Location and Signal Parameter Estimation,
Machine Learning –ICANN 2011” 21st International Conference on IEEE Trans. Antennas Propagation (1986) 276–280.
Artificial Neural Networks Espoo, Finland, June 14-17, 2011 [19] Dezhong Peng, Zhang Yi.” A New Algorithm for Sequential Minor
Proceedings. Component Analysis” International Journal of Computational
[3] G. Dreyfus.”Neural Networks, Methodology and Applications” Original Intelligence Research ,ISSN 0973-1873 Vol.2, No.2 (2006).
French edition published by Eyrolles, springer, Paris,2004, ISBN 103- [20] Jie Luo, Xieting Ling “Minor Component Analysis with Independent to
540-22980. Blind 2 Channel Equalization,” IEEE, Fudan University-China.
[4] Dovid Levin, Emanuel A., Sharon G. “Maximum Likelihood Estimation [21] Donghai Li, Shihai Gao, Feng Wang, “Direction of Arrival Estimation
of Direction of Arrival using an acoustic vector-sensor”, International Based on Minor Component Analysis Approach”, Neural Information
Audio Laboratories Erlangen, Germany, 2012. Processing, Springer-Verlag Berlin Heidelberg , 2006.
[5] Mitsuharu M., Shuji H. “Multiple Signal Classification by Aggregated [22] Belloni F., Richter A., Koivunen V. “ DOA Estimation via Manifold
Microphones” 2005, IEICE, ISSN: 0916-8508. Separation for Arbitrary Array Structures,” IEEE, Transaction on signal
[6] FeifeiGao and Alex B. Gershman “A Generalized ESPRIT Approach to processing, Vol, 55,No.10, October, 2007.
Direction-of-Arrival Estimation”IEEE Signal Processing Letter, Vol. 12,
No. 3, March 2005. AUTHOR PROFILE
[7] Rashid Ahmed , John A. Avaritsiotis,” MCA Learning Algorithm for Rashid Ahmed is a Ph.D student in the School of Electrical and
Incident Signals Estimation:A Review”, IJCTT Journal,Feb. 2014. Computer Engineering, National Technical University of Athens-NTUA,
[8] G.Wang and X.-G.Xia.” Iterative Algorithm for Direction of Arrival Greece, and is a student member of the IEEE.
Estimation with wideband chirp signals” IEEE., 2000, ISSN : 1350-
2395. John N. Avaritsiotis is a Professor of Microelectronics in the
Department of Electrical and Computer Engineering of the National Technical
[9] Weidong J., Shixi Y., Yongping C.,” DOA Estimation of Multiple University of Athens-NTUA. He has published over 80 technical articles in
Convolutively Mixed Sources Based on Principle Component Analysis”, various scientific journals, and has presented more than 30 papers at
Springer, ICONIP , pp. 340–348, 2009. international conferences. His present research interests concern the
[10] Qingfu Zhang, Yiu-Wung Leung, “A Class of Learning Algorithms for development of surface micro-machining processes for the production of
Principal Component Analysis and Minor Component Analysis”, IEEE micromechanical sensors and design and prototyping of various types of
Transection on Neural Network, Vol. 11, No.2, March 2000. multi-sensor systems for various applications. He is the Director of two R&D
[11] Adnan S.,” DOA Based Minor Component Estimation using Neural Laboratories: the Microelectronics Lab and the Electronic Sensors Lab of
Networks”,AJES, Electrical Engineering Dept., Vol.3, No.1, 2010. NTUA. He is the Co-Editor of the Journal Active and Passive Electronic
Devices, Guest Editor of IEEE Transactions on Components, Packaging and
[12] Yanwa Zhang “CGHA For Principal Component Extraction In The manufacturing Technology, Senior Member of IEEE and Member of IOP
Complex Domain “,IEEE, Transaction on Neural Networks ,Vol. 8,no. and ISHM.
5,pp. 1031-1036 ,sept. 1997.

44 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 5, No. 12, 2014

Empirical Analyis of Public ICT Development Project


Objectives in Hungary
Márta Aranyossy András Nemeslaki Adrienn Fekó
Department of Enterprise Finances Institute of E-Public Service Prime Minister’s Office
Corvinus University of Budapest Development Public Administration Reform
Budapest, Hungary National University of Public Programmes Managing Authority
Service Budapest, Hungary
Budapest, Hungary

Abstract—E-government development in most European I. INTRODUCTION


countries was ensured from Structural Funds in the period of
2007-2014. In our paper we show how Hungary has used these E-government, that is “the use of ICT and its application by
funds in order to achieve efficiency and effectiveness in its public the government for the provision of information and public
services. The main objective of our research has been to explore services to the people” [1], is seen as a driver of government
the budgetary and timing characteristics of public ICT spending, effectiveness and as a key source of competitiveness and
and analyze the implicit and explicit objectives of eGovernment economic growth worldwide and in the EU 28 member states
projects in Hungary. We applied exploratory text analyzes as a [1], [2]. The EU therefore has been continuously pushing
novel and objective way to analyze the focus of eGovernment digital agenda policies and aligning financial support for
development policy. Our main findings are: eligible member states for e-government development.
Effective use of these funds and closing the gap between
- After the text analysis of 85 Electronic Public advanced and lagging ICT adaptors is essential for EUs global
Administration Operational Programme (EPAOP) and 65 State competiveness and increased social cohesion.
Reform Operational Programme (SROP) projects we found that
keyword statistics are generally consistent with the main policy When we look at actual data of eGovernment use (Fig.1) in
level objectives of the Operative Programmes, however there are the EU28 countries we find that only 41% of the EU28
some fields which are not emphasized, such as: the role of population used e-services in 2013 which is down from 44% in
participation, social partners, local-government; and the 2012 and almost at the same level as it was in 2011. Currently
improvement need of user skills through public information only 9 out of 28 countries are above 50% eGovernment use,
campaigns. namely DK, NL, SE, FI, FR, LU, AT, SI, BE (although DE and
EE are also close to it). In five countries (RO, IT, BG, PL and
- Governmental changes are clearly reflected in the goal HR) online public services are used by less than a quarter of
hierarchy: contracting in EPAOP and SROP happened in two the population with generally little progress in term of
separate waves - the significant part of financing was committed
catching-up. The difference between the leading e-adaptor and
during stabilized governments in the beginning and end phase of
the planning period, with a relatively passive period during
the last ones (DK and RO) is more than 70% indicating a huge
governmental change in 2010-2011. challenge in the EU´s e-cohesion.

Keywords—eGovernment; eGovernment strategy; eGovernment


policy; eGovernment goals

eGovernment use by citizens in the last 12 months


100%
90%
2012 2013
80%
70%
60%
50%
40%
30%
20%
10%
0%
DK NL SE FI FR LU AT SI BE DE EE IE ES EU28 UK PT HU EL LV LT SK MT CY CZ HR PL BG IT RO

Fig. 1. eGovernemt use by citizens [3]

45 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 5, No. 12, 2014

When we look at user-centricity and transparency, the these positions justified when in the period of 2007-2013
results for EU28 (Fig. 2) show that for many countries in the Hungary had spent 720mEuros from ESF and ESRF on ICT
government domains the provision of user-friendly services is based modernization of government?
already a reality. Some countries still score 50 or less,
displaying a rather analogue approach to public service Hungary´s case [4] is a relevant example of ICT investment
delivery (SK, RO, HU, EL). effectiveness in the European public sector, especially when
we compare how its eGovernment ranking has been changed
over the 7 year period of the ESF and ESRF investment
Transparent eGovernment in the EU process (Fig. 3). Before the investment period Hungary´s
(2012-2013) ranking position had been steadily improving it went up to 27th
100 word wide and 16th in the EU from 47 and 28. Then during the
7 year period we can observe a decline until 2008, a short
period of improvement until 2010 and since then a steady fall
50 back again ending up in 39th and 25th in 2014, actually in a
worse position than in 2007.
0
We carried out a detailed exploratory text analysis of
LT
ES

LV

EU28

LU
EE

EL
RO
PT

HR
BG
PL

HU
BE
IT
IE

CY
CZ
MT

DK

UK

DE
AT

FR
FI

SI
SE

NL

SK
project objectives, financial data, timing and duration of over
100 ESF and ESRF projects in order to identify
User-Centric eGovernment in the EU  basic project value and timing characteristics,
(2012-2013)
 key public IT development areas and
100
 major clusters of public ICT investments.
Our research expands to the period of 2007-2013 during
50
which more than 720 million Euros where invested into ICT
based modernization of Hungarian public administration. The
0 lessons learned how this amount was deployed is essential to
assess project results and impacts which will appear with a
LV
LT

EU28

LU

EL
RO
PT
ES
EE

PL
BE

BG

HR
HU
IE

IT

CY
CZ
MT

DK

UK
DE
FI
AT
NL

FR

SI
SE

SK

considerable time lag, but as an immediate importance, they


are also indicative for the 2014-2020 planning period.
Fig. 2. User-centric and transparent eGovernment in the EU [3]

Transparency is an important element for increasing the


take-up of online public services, since it helps building trust of
citizens in public administrations. Data show that this
important feature is still not positioned at the center of
eGovernment strategies in case of many governments, except
with few exceptions, and the variance between the leaders and
the followers is reaching 80%.
Improvement of e-cohesion – closing the digital gap
between the leaders and laggers – is essential in the agenda of
the European Commission´s (EC) innovation strategy [7]. To
provide financial resources for lagging countries, EC has
created the European Structural Regional Funds (ESRF) and
the European Social Cohesion Fund (ESF) which among their Fig. 3. Hungary’s eGovernment ranking
other targets transfer funds for ICT development to such
European regions where GDP/capita is less than 75% of EU28 Since ESF and ESRF ICT development resources will be
average. According to the Nomenclature of Territorial Units very relevant in the following years, and since they impact
for Statistics (NUTS) these countries are called convergent many countries, we also intend to expand our research question
regions and in the period 2014-2020 14 EU countries belong into more general directions, that is, how the “independent”
there. variable of the ICT investment equation is determined using
the e-cohesion principles in the rather diverse EU28
Convergent region countries choose their own public ICT environment.
development strategies in alignment with the ESF and ESRF
resulting in different paths to reach European e-cohesion. In After the introduction our paper is structured as follows.
our paper we show how Hungary – as a representative of these Firstly, we introduce the conceptual background of the two EU
countries – has used these investments in order to achieve funds for public ICT development in Hungary, the main areas
efficiency and effectiveness in its public services. As we see, and budgetary and timing characteristics. Secondly, we
Hungary is 16th in e-government usage, but 26th in user describe the text mining research methodology and statistical
centric and 25th in transparent e-government services. How are tools we have used for analysing project objectives. In this

46 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 5, No. 12, 2014

section we also outline how the text analysis was combined provided by the Electronic Public Administration Operational
with financial and project duration data in order to assess the Programme (EPAOP) while organizational and human
value of particular development areas in public administration. resources modernization is ensured by the State Reform
In the third section we discuss our results, which will follow Operational Programme (SROP).
with conclusion and suggestions to expand both the scope and
depth of research. EPAOP is aimed to increase performance in public
administration by means of ICT developments. The main
II. CONCEPTUAL BACKGROUND OF PUBLIC ICT objectives of EPAOP are: to reduce administration in the
DEVELOPMENTS BASED ON STRUCTURAL AND COHESION public sector, to improve the level of services and to assure
FUNDS effective operation of public administration. Electronic Public
Administration Operational Programme has two main areas:
The value of ICT investment in the public sector can be convergence and regional competitiveness including
assessed buy how much it helps to achieve better governance. employment. In order to achieve these, the programme is
Outcomes in this respect are connected to effectiveness and broken down to five priority axes listed in Table I.
efficiency of public policy execution [5] such as healthcare,
education, insurance, taxation and other areas of the modern TABLE I. ELECTRONIC PUBLIC ADMINISTRATION OPERATIONAL
state [6]. Since, the development of these policies is driven by PROGRAMME (EPAOP)1
political values the assessment of the final impact of good
governance is determined by citizens votes in democratic Budget Budget
Priority title Fund (million (billion
societies. A major difference between e-business and e- EUR) HUF)
government, as some research has outlined, is that while in
Priority 1: Public administration and ERDF 174.086 51.686
business the alignment and functional integrity of ICT renewal of the internal processes of
management has been recognized and practiced, in public administrative services
administration this has not been done yet as effectively as in Priority 2: Projects promoting access to ERDF 133.186 39.543
business [5]. public administration services

The ICT based value creation mechanisms usually grouped Priority 3: Priority projects ERDF 83.264 24.721
Priority 4: Technical assistance in ERDF 5.632 1.672
into two main groups for creating an infrastructure for effective convergence regions
public policy. The first group comprises visions related to Priority 5: Technical assistance in the ERDF 1.526 0.453
integration. This group includes ideas such as desiloisation, Central Hungary region
inter-operability, the one-stop-shop, seamless government and Source: [11]
portals. These are part of the wider picture of joined-up-
government or whole-of-government. The second group relate The mission of the State Reform Operational Programme
to governance. In this group we find e-collaboration, e- (SROP) is to enhance the performance of the public
consultation, e-participation, e-voting and on-line voting which administration system through institutional capacity building.
lead into more holistic concepts such as deliberative The main objectives of SROP are: to improve human resources
democracy and the creative commons [7]. and to modernise the organisational operation. Accordingly,
Lips argues that e-government is still too techno-centric the priority axes of the operational programme are focusing on
and many public officials associate e-Government with the two main resources of the public administration system, i.e.
technology, with the technology deterministic attitude rather on the development of human resources and on organizational
distant from the administrative complexity and political risk of processes (see Table II).
governance [8]. Lack of strategic alignment then results in Priority axes 1, 2 and 3 are closely coupled on the OP level,
conceptual de-coupling of high-order objectives of governance since IT requirements for the projects of human and
and although several localized value and process improvement organisational objectives of SROPs are financed from
measures can be achieved breakthrough of transformation in EPAOPs. For the remainder of this paper we are going to focus
government is still to come. As Frank Bannister and Regina on the Priority 1, 2 and 3 of both EPAOP and SROP. Table III.
Connolly [7] argues after looking into the past twenty years of illustrates the total amount of contracted projects under
e-government history that the concept of transformative EPAOP and SROP.
government has not proved well defined and most of the time it
used in conjunction with a large list of superfluous adjectives The framework amount of EPAOP is EUR 397.69 million,
appealing to a great audience but missing a systematic while the contracted amount is EUR 454.93 million. The main
breakdown and outline of interplay between technology and reason for this over commitment of the Hungarian Government
public administration [7]. is the fulfilment desire of the so called n+2 and n+3 goals2 after
the closure of the 2007-2013 programming cycle. On the
Modernization of Hungarian public administration is based contrary, in SROP the contracted sum reached only EUR
partly on the transformation of processes and procedures, and
partly on the provision of extensive access to electronic public
1
administration services to citizens. In order to streamline office Applied exchange rate: 296,9 HUF/EUR. The 2013 annual average
work, it is necessary that the procedures are reorganized, Hungarian National Bank HUF/EUR exchange rate – 296,9 – was used for
conversion, but we do not intend to further analyze the currency or exchange
technology is modernized, and these two areas systematically rate related financial aspects of the OPs.
build on each other ([9] - as it is the case in the private sector, 2
N+3 means that the allocation from Structural Funds must be used by the
see also [10]). Service and technology modernization is member states in 3 years or in case of n+2 in 2 years after the commitment.

47 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 5, No. 12, 2014

149,374 million, while the total framework would have III. RESEARCH METHODOLOGY
allowed EUR 165,783 million. In this OP there were close to
10 projects which were under preparation or in the application For the detailed analysis of EPAOP and SROP Priority 1, 2
phase during the time of our data collection. and 3 objectives and project spending we used data available
from the website of the Hungarian National Development
TABLE II. STATE REFORM OPERATIONAL PROGRAMME (SROP) Agency (palyazat.gov.hu3, 2013) and from the website of DG
Region and DG Employment. We collected the following data
and organized them in a standardized format: the contract sum
Budget Budget of the projects, main aims of the projects, indicators used,
Priority title Fund (million (billion
EUR) HUF) planned project start, actual project start, planned project end,
payment rate, and status of the projects. For the analysis of the
Priority 1: Renewal of processes and ESF 79.919 23.728 text based objective statements and indicators in order to get a
organization development structured and objective insight about the documented goals of
Priority 2: Improving the quality of human ESF 31.819 9.447 the projects we used text analysis techniques (word frequency
resources analysis) and text analysis software (NVivo and Textrend Core
Priority 3: Developments in the Central ESF 47.420 14.079
Hungary Region
1.0). The methodology enabled us to identify the smallest
Priority 4: Technical assistance in the ESF 4.651 1.381 components of the development objectives, compare them with
convergence regions the official policy’s goal system, the priority axes and analyze
Priority 5: Technical assistance in the ESF 1.974 0.586 them across governmental periods.
Central Hungary Region
Source: [11] First we identified the most frequent 200 words (keywords)
in the objective section of the feasibility studies in each of the
project documentations, filtering out conjunctions and different
TABLE III. DESCRIPTIVE STATISTICS forms of the same words. Then the authors decided (with a
majority rule) on the top 100 keywords filtering out general
EPAOP SROP meaning words, which could not be interpreted in the context
(million (million of the eGovernment OPs. Two different lists of keywords were
N N
EUR) EUR)
created, one for EPAOP, and one for SROP. We visualized
Contract sum - Total 454.928 76 149.374 59 some of our findings in the form of word frequency based
Priority 1 239.772 41 99.838 41 wordclouds (where font size directly reflects the differences in
keyword frequency). We used the frequency of these final set
Priority 2 178.342 28 37.636 19
of keywords as variables in the further analysis. (Coding: 0 –
Priority 3 36.814 7 11.901 4 the keyword did not occur in the project objective; 1 – the
Contract sum – Average 5.617 85 2.332 65 keyword occurred one time in the project objective; 2- the
keyword occurred more than once in the project objective.) We
Fig. 4 illustrates that contracting EPAOP and SROP
also created a weighted list of keyword frequencies, were the
happened in two separate waves with a relatively passive
number of occurrence of the different keywords were weighted
period in 2010-2011. Significant amount was committed
by the contract sum (in HUF million) of the projects.
relatively late in the planning period: in the second half of 2012
Multivariate statistical methods (cluster analysis) and statistical
and during the year 2013, consequently these projects only
tests (comparing frequencies and means) were applied to get a
close during 2014 or in 2015.
more in-depth understanding of the implicit goal structure of
the projects.

Fig. 4. Hungary’s eGovernment ranking

3
The website of National Development Agency until 31 December 2013.

48 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 5, No. 12, 2014

TABLE IV. TOP 20 KEYWORDS IN EPAOP AND SROP


IV. DISCUSSION AND FINDINGS
In Table IV we summarize our key findings on keyword EPAOP SROP
priorities of EPAOP and SROP objectives. Both the normal Keyword Count Keyword Count
count and the weighted frequency are in alignment with the system 108 project 59
generally stated objectives of these program portfolios, and project 92 objective 55
show coherence. It is interesting to notice, however, that the service 90 public 46
leading keyword in the contract sum weighted list of SROP is data 83 necessary 32
“data” preceding the more general terms describing development 78 organizational 25
“organization”, “development” and “public administration”. electronic 72 public administration 18
From Table IV we can also observe the interplay between the formation 63 tender 18
initiatives of ICT based and human capacity based client 48 integrate 16
developments (EPAOP vs. SROP keywords): the first listing effective 47 development 15
“systems”, “service” and “ICT support”, while the second support 47 objective 15
focusing on “data”, “organizations”, and “public administration administration 47 system 13
support”. integrate 45 new 13
public administration 40 municipality 12
In order to adhere to the spatial limitation of the paper we
relation 40 societal 12
present three findings from our analysis. Firstly we discuss
application 37 program 11
how Priorities 1 and 2 in both EPAOP and SROP goal systems
registration 37 design 11
are structured. Secondly, we look at the relationship between
central 33 form 11
the goal structure of EPAOP and SROP projects. Thirdly, we
opportunity 32 state 11
re-aggregate the decomposed objective elements according to
modern 30 effective 11
the timeline of the 2007-2013 planning period and draw
total 30 adequate 11
conclusion on the modernization priorities of the three
governments in this era.
TABLE V. EPAOP OFFICIAL PRIORITY GOALS VERSUS KEYWORDS

Priority EPAOP Priority goals in the official Keywords based on objective text analysis
EPAOP document (with priority-level frequency)
Priority Electronization of public administration services and Electronic (35); informatics (43); public administration (16); service (36); development
1.1 to raise the level of transactions (46); integrated (26), central (11); project (48); data (52); hardware (8); software (13);
server (6)
The renewal of the procedures and IT support of Renewal (5); procedure (19); execution (of penalty; 21; support (36); process (16); project
judicial system and the registry court (48); punishment (system)
Setting-up service centres for local governments Service (36), project (48), public administration (16); integrated (26); processes (16);
relation (22)
IT background of law enforcement, emergency Project (48); magisterial (14); informatics (43); public administration (16); execution (of
organisations and public persecution offices penalty; 21); support (36); process (16); relation (22); development (46); hardware (8);
software (13)
Priority Establish the central electronic services required for Central (11); electronic (35); service (36); effective (25); public administration (16); );
1.2 the efficient operation of public administration hardware (8); software (13); server (6); project (48); infrastructure (12), integrated (26);
network (6), support (36), development (46); modern (18); organisation (12)
Establishment of data links among public Data (52); relation (22); public administration (16); project (48); functional (9); processes
administration systems (16)
Implementation of electronic document Electronic (35); procedure (19); project (48); hardware (8); software (13); central (11);
management system. complex (11); database (11); service (36)
Modernisation of the financial and Internal (11); processes (16); modern (18); service (36)
economic operation processes.

Priority Provision of service interface for clients. Service (47), integrate (14), central (17), electronic (31), project (39), data (28), informatics
2.1 (13), application (16), client (26)
Central client interface services. Central (17), client (26), service (47), development (24), electronic (31), integrate (14)
Electronic payment system. Electronic (31), project (39), integrate (14), central (17), client (26)
Front office services, common territorial service Service (47), public administration (20), state (5), development (24), governmental (7),
centres, upgrade of government offices project (39), integrate (14)
Priority Development of the Central Electronic Service Development (24), central (17), electronic (31), service (47), safety (5), hardware (2);
2.2 System and IT security infrastructure. software (2); server (1); citizens (7), public administration (20), project (39), data (28),
information (8), info-communicational (2)
Priority Electronic authentication of citizens Certified (0), identification (3), citizens (7), processes (4), project (39), client (26), data
2.3 (28), administration (23), centralised (1)

49 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 5, No. 12, 2014

A. EPAOP objectives – ICT supported reorganisation of


internal processes in public administration and access to
public administration services
To illustrate the goal-consistencies between EPAOP and
SROP projects through keyword connections we compared
Priority axes 1 and 2 high level policy objectives and the most
frequent keywords. In Table V. we summarised our results
concerning the EPAOP projects.
In order to provide a visual demonstration of keyword
frequencies and goal congruence we created the word cloud
figures of EPAOP for the two main priority axes: Fig. 6. EPAOP Priority 2 goals wordcloud, based on keyword frequency,
 Priority axis 1 - Public administration and renewal of where font sizes reflect word frequency
the internal processes of administrative services (Fig.
of electronic document management system”. We have found
5.), and
during the word frequency analysis that these objectives cannot
 Priority axis 2 - Financing projects and promoting be prominently seen in the objective keyword frequency lists.
access to public administration services (Fig. 6.). The reasons behind this might be, that ASPs and the
The most frequent EPAOP Priority 1 keywords were: electronic document management systems are financed from
“data”, “project”, “development”, “informatics”, “service”, Priority 3 of EPAOP and will be realised only in the Central
support”, “electronic”, “formation”. These words appeared in Hungary Region of Hungary.
the analysed project goals 33-52 times. The less frequent words As visualised on Fig. 6, the most frequent keywords under
(which appear 1-4 times only) are usually related to the special EPAOP Priority 2 are “service”, “project”, “system” and
project topics, such as “land register”, “agricultural”, “electronic” (with frequency above 30). As Priority 2 aims to
“taxpayers”. In addition to the keywords we can also look at develop citizens’ access to services, the role of “client” is of
the most important projects in terms of their allocated budgets. key importance – the keyword “client” occurred 26 times
The most significant EPAOP Priority 1 financed projects (i.e. among project goals. The word “central” (17 occurrence) is
projects with a budget above HUF 2billion – EUR 6,7million) also necessary to the fulfil Priority 2 goals, as the systems
served the following objectives: “the modernisation of should be implemented centrally and they should also be
financial and economic operation processes”, “the efficient “integrated” (14 occurrence). Among the dedicated objectives,
support of the work processes of public administration the improvement of user skills remained only on the level of
organisations”; “the implementation of monitoring and plans, regardless of the fact that it should have been one of the
decision support systems” and “the IT development of the key objectives of the programme in order to increase the level
organisations providing back-office functions for public of usage of public electronic services.
administration”. These fields are of key importance and have
been highly emphasized among project goals. Neither appears electronic authentication of citizens
accentuated at all in the word cloud, probably because the topic
As we can see in Table V, the keywords based on objective is covered by one main project (“Complex customer
text analysis usually cover the sub priorities’ goals, but there identification”, EUR 7.22 million), so the keyword frequency
are some exceptions. In the documents of EPAOP the was not so high, but the term “identification” appears in Fig. 6.
following objectives are also of key importance: “centres for
local governments, the local public administration framework, B. SROP objectives – human resources capability
ASP (Application Service Providers)” and the “implementation development
As we described earlier the State Reform Operational
Programme supports the establishment of the organisational
structure of institutions followed by the human resources and
procedural adoption of the new or improved organizational
structures. The first 3 priority axis of SROP finance projects
are closely related to eGovernment development in Hungary.
In Table VI we compared the policy-level official goals of
SROP Priority 1 and 2 with the findings of the objective
statement´s text analysis. Fig. 7 and 8 show the frequency-
based wordclouds visualising SROP Priority 1 and 2 goal
system.
Fig. 5. EPAOP Priority 1 goals wordcloud, based on keyword frequency,
where font sizes reflect word frequency SROP Priority 1 focuses on the renewal of processes and
organisation development in public administration. Under
SROP Priority 1 objectives “public administration” was
mentioned at 26 times, “operation” 24 times, “law” 20 times.
Other words occurred less than 20 times in the projects.

50 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 5, No. 12, 2014

“Development”, “effective”, “organisational” 19 times, also as important keywords in SROP projects – these were
“process, system, necessary” 16 times, and “support” 15 times. mentioned 13 times.
“Internal”, “integrate”, “governmental” and “services” proved
TABLE VI. SROP OFFICIAL PRIORITY GOALS VERSUS KEYWORDS FROM OBJECTIVE TEXT ANALYSIS

Priority Priority goals in the official SROP document Keywords based on objective text analysis
(with priority-level frequency)
Priority 1.1 Improvement of the capacity for governance and local Training (1), effective (19), performance (8), governmental (13), project (38), operation
government (24), development (19)
Raising the quality of legislation Law (20), project (38), organizational (19), simplifications (5), process (16)

Active involvement of the social partners -


Priority 1.2 Renewal of procedures and work processes as well as New (12), process (16), organisational (19), development (19), project (38), problems
organisation development (4), realisation (4)
Transformation of the case handling Transformation (3), process (16), simplification (5), organisations (6), project (38)
administration procedures
Development of the efficient and cost- Development (19), effective (19), organisational (19), culture (4), governmental (13),
effective organisations project (38)
Priority 2.1 Establishment of open recruitment and an efficient Effective (7), internal (3), electronic (3), knowledge (4), public administration (28),
internal replacement training (21), system (14), project (17)

Priority 2.2 Performance-based career pathways Effective (7), training (21), career (4), project (17)
improvement in local governments. Another important priority
Regarding the most frequent keywords we can assume that area sticks out from the analysis: the topic of electronic file
the official SROP goals are generally well translated into and document management systems, that is to widen the usage
planned project goals, since all of the keywords of Priority 1’s of these solutions countrywide.
general objectives occur more than 10 times (Fig. 7.): “new,
processes, organisation and development”. It is interesting to
see, however, that one of the main Priority 1 goals,
“involvement of social partners” cannot be seen in terms of
frequent objective key words. The authors´ assumption is that
these objectives are left out due to other more important aims,
and/or the related projects were under preparation or in the
application phase during the data collection.
SROP Priority 2 aims to improve the quality of human
resources in the public sector. Priority 2 official goals were
usually covered by the project objectives, that is what the
keyword frequency analysis shows. The most frequent
keywords are (Fig. 8.): “public administration”, “training”,
“project”, “system”, “necessary”.
Summarising the results of our keyword frequency analysis Fig. 7. SROP Priority 1 goals wordcloud, based on keyword frequency,
of EPAOP and SROP projects we can state that the keyword where font sizes reflect word frequency
frequencies and wordclouds generally do illustrate the main
policy level objectives of the Operative Programmes, however,
there are some fields which are not emphasized among project
goals – such as:
 the role of participation and social partners, and local
government;
 improvement of user skills through public information
campaigns.
We also have to reflect on the geographical coverage of the
projects. There are some project goals which did not meet the
broader objective of regional convergence; these were only
applicable in the Central Hungary region. This poses a major
challenge for the programming period of 2014-2020 in order to
assure a countrywide coverage of development projects and Fig. 8. SROP Priority 2 goals wordcloud, based on keyword frequency,
goals. For example the ASP project has been implemented in where font sizes reflect word frequency
the Central Hungary Region, but it seems an important aim to
assure ASP services countrywide especially for productivity

51 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 5, No. 12, 2014

Fig. 9. EPAOP-SROP mixed cluster analysis - displaying the most frequent keywords (average frequency > 0,5), and other descriptive characteristics

C. Harmonization of eGovernmentgoals in ICT versus human hand in hand with higher project durations and usually higher
capacity related projects payment (and completion) rates as well.
After decomposing the goal structure of EPAOP and SROP D. Different aspects of modernization in the planning period
projects, in order to identify the implicit goal hierarchy and
During our research design we assumed that governmental
compare it with the official policy targets, we used cluster
vision might influence the main objectives and other
analysis to reveal the emergent relationships of EPAOP and
characteristics of eGovernment projects. Fig. 10. illustrates the
SROP projects based on similarities of their objective structure
differences of keyword frequency in the different governmental
(described by the keyword frequencies). During the cluster
periods during 2007-2013.
analysis 14 clusters were identified, but only the ones with N>3
were included in the further analysis, and these are shown inn In the period 2007-2008 „Gyurcsány Government” and
Fig. 6. presenting the number of projects, average budget, 2009-2010 “Bajnai Government” the development and IT
duration, and key objectives. support of the judicial systems was important while these
keywords got less dominant in the succeeding government
We can see, that while some clusters include projects only
periods. While “strategic” approach was frequent in the
either from EPAOP or SROP, there are some mixed-clusters as
„Gyurcsány Government”, some of the prevalent keywords in
well (Cluster 13, 10 and Cluster 3 in Fig. 9). This suggests that
the “Orbán Government” were “integrate”, “opportunity”,
some EPOP and SROP projects have similar implicit goal
“formation” and “realization”, suggesting a different – more
structure – supporting the original policy level intention of
execution oriented - approach to eGovernment development.
financing technology support of SROP organisational
development from EPAOP projects. One of these meta- In the first and third governmental cycle the average
clusters, Cluster 3 is characterized by general system contract sum of projects was quite high (EUR 5,25 and 4,21
development keywords only, while the mixed Cluster 13 is million), corresponding with the project durations which was
more focused on service development. From our analytical longer than 2 years on average, compared with the “Bajnai
point of view the most interesting is Cluster 10, which Government” period which financed significantly smaller
includes projects from both operative programmes with projects (EUR 2,67million). It is interesting to note, that the
prominent keywords like “electronic” “service” „Gyurcsány Government” started long projects (30 month in
“development”, and characteristics like “central” and average), while the next two governments launched
“effective”. By looking at their other attributes of this cluster significantly shorter projects (22 and 20 month).
we can say, that these projects had short planned durations,
long delays, and by the time of our data collection most of If we take into account the main implications in Fig. 1. and
them were uncompleted. we link it with the governmental periods, we can observe that
after the “Bajnai Government” there was a relatively passive
Fig. 9 also illustrates that the largest projects tend to belong period in the implementation of EPAOP and SROP projects
to the two smaller and more specific EPAOP clusters (Cluster (between 2010 and 2012 second half year), the implementation
11 and 2), showing that higher average budget values goes continued only in the second half of year 2012, with a slight
shift in focus in terms of objectives, and also in beneficiaries

52 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 5, No. 12, 2014

and project sizes. The biggest proportion of projects was number and value.
launched in the period of the „Orbán Government”, both in

Fig. 10. Governments influence on EPAOP and SROP projects (only the significant differences displayed, α<10%)

 Although Structural Funds are the most harmonized


V. CONCLUSION public investment schemes in the EU, more attention to
The main objective of our research has been to explore the harmonized data collection and analysis could provide
financial and timing characteristics of public ICT spending, extremely valuable input for economic impact
and analyse the implicit and explicit objective system of assessment of ICT projects. At present, comparative
eGovernment projects in Hungary. Based on text analysis of data is still limited, member states do not collect and
two main operative programmes – Electronic Public offer data and information on public IT projects in the
Administration Operative Programme and State Reform same or similar structure.
Operative Programme (EPAOP and SROP) – we found that
 Our research has focused on the policy objectives and
keyword statistics are generally consistent with main policy
project deliverables, not on execution and actual results
level objectives of the Operative Programmes, however there
– so this is only the first step toward mapping the e-
are some fields which are not emphasized among project goals,
government value creation process.
such as: the role of participation and social partners, local
government, and improvement of user skills through public These limitations offer directions for further research about
information campaigns. public ICT project effectiveness: research should continue data
collection concerning the execution phase, examining the
The relationship between SROP and EPAOP goal
consistency of objectives with the actual deliverables and
structures – the human and technology focused aspects of
outcomes. Another extension of our “goal hierarchy” approach
eGovernment development – were also compared, and the
might be the wider European comparison of such policy-
results of cluster analysis demonstrate the consistency of goal
project consistency analysis research endeavours to explore e-
structures of several EPAOP and SROP projects.
cohesion at a multinational scale.
One of our most interesting findings was in terms of REFERENCES
timing: contracting in EPAOP and SROP happened in two [1] European Commission, Delivering on the European Advantage? “How
separate waves with a relatively passive period in 2010-2011, European Governments can and should Benefit from innovative public
not independent of governmental cycles. A significant amount services” e-Government Benchmark, DG Communications, Networks,
of founds was committed relatively late in the planning period, Content and Technology, 2014
in the second half of 2012 and during the year 2013, indicating [2] United Nations E-Government Survey 2014, Department of Economic
that governmental changes resulted in reconfiguration of the and Social Affairs, United Nations, 2014
goal system. [3] European Commission. Scoreboard 2014 - Developments in
eGovernment in the EU 2014. 2014. https://ec.europa.eu/digital-
We are aware of some limitations of our data collection and agenda/en/news/scoreboard-2014-developments-egovernment-eu-2014.
methodology regarding ICT-related public projects – these Retrieved: 26/11/2014
limitations stem from the following sources: [4] Sasvari, P. The macroeconomic effect of the information and
communication technology in Hungary. International Journal of
 We concentrate only the IT projects in the public sector Advanced Computer Science and Applications, 2011. 2 (12), 75-81.
financed from Structural Funds, but in Hungary and in [5] Lips, M. E-Government is dead: Long live Public Administration 2.0.
other EU countries there can be IT projects financed Information Polity. 2012. 17 (3-4), 239–250.
from other sources as well. [6] Barr, N. The Welfare State as Piggy Bank, Information, Risk,
Uncertainty, and the Role of the State, Oxford University Press, 2001.

53 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 5, No. 12, 2014

[7] Bannister, F., Connolly, R. Forward to the past: Lessons for the future of [10] Brynjolfsson, E., Hitt, L. M., Yang, S. Intangible assets: Computers and
e-government from the story so far. Information Polity. 2012. 17 (3-4), organizational capital. Brookings Papers on Economic Activity. 2002/1.
211–226. 137-198.
[8] Kanaan, R., Kanaan, G. The Failure of E-government in Jordan to Fulfill [11] European Commission. Report on the Implementation of the Electronic
Potential. International Journal of Advanced Computer Science & Public Administration Operational Programme in 2012. Annual
Applications. 2013. 4(12). 157-161. Implementation Report 2012. Approved by the EPAOP Monitoring
[9] Sensuse, D. I., Ramadhan, A. The Relationships of Soft Systems Committee 11 June 2013. Submitted to the European Commission by 30
Methodology (SSM), Business Process Modeling and e-Government. June 2013. Website of DG Region:
International Journal of Advanced Computer Science and Applications, ec.europa.eu/regional_policy/country/prordn/details_new.cfm?gv_PAY
2012. 3(1). =HU&gv_reg=ALL&gv_PGM=1181&LAN=7&gv_per=2&gv_defL=7
Retrieved: 10/11/2013

54 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 5, No. 12, 2014

Investigating the Idiotop Paratop Interaction in the


Artificial Immune Networks
Hossam Meshref, Member IEEE
Computer Science Department
College of Computers and Information Technology
Taif University, Taif, Saudi Arabia

Abstract—The artificial immune system is a new immune response in expected to be faster given that the
computational intelligence technique that has been investigated network have seen it before, i.e. learned how to deal with it [1].
for the past decade. By reviewing the literature, two observations
were found that could affect the network learning process. First, Early researchers discovered that the immune system has
most researchers do not focus on Paratop-Epitop and Paratop- Idiotopic networks that use stimulation and suppression among
Idiotop interactions within the network. Second, most its elements to achieve immunity against antigens (Ags). The
researchers depict the interaction within the network with all the part of an antigen that could be recognized by antibodies is
network components present from the beginning until the end of called epitope (Ep). It worth mentioning that a regular Ag may
the learning process. In this research, efforts were devoted to carry more than one epitope, and the result of this stimulation
deal with the aforementioned observations. The findings were is that the B-cells start to produce Abs. On the other hand, the
able to differentiate between interactions in a node within a part of the antibody that can recognize epitopes of antigens is
network, and total interactions in the network. A small called paratope (P). Surprisingly, part of an antibody, called
simulation problem was used to show the effect of choosing a idiotope (Id), could be regarded as antigen by other antibodies’
steady number of antibodies during network interactions. Results in the idiotopic network [2].
showed that a considerable number of interactions could be
saved during network learning, which will lead to faster The basic idea of Idiotopic Network Hypothesis is that the
convergence. In conclusion, it is believed that the designed model Lymphocytes in this network communicate among each other
is ready to be used in many applications. Therefore, it is mutually to gain immunity. Ag stimulates their matching
recommend the use of our model in different applications such as paratops as key to the lock relation. On the other hand, the
controlling robots in hazardous rescue environment to save idiotop Id1 of Ab1 will stimulate the paratop P2 of B cell 2,
human lives. which has Ab2 attached to it. Alternatively, from B cell #2
point of view, we find that Id1 acts as an Ag. Consequently,
Keywords—Artificial Immune Systems; Idiotopic Networks; Ab1 with B cell #1 will be suppressed by Ab2. From another
Paratop-Epitop; Paratop-Idiotop prospective we will find that Ab2 stimulate Ab1 through Id2
I. INTRODUCTION [3]. The whole network members will mutually stimulate and
suppress each other with a closed loop chain that acts as a self
The artificial immune system is a young field and many non-self recognizer, and eventually the suitable response to the
researchers are trying to explore its boundaries to utilize it in Ag will evolve, see figure 1: a, b.
different applications. However, it is not new that research
efforts were invested to understand different biological
processes through modeling. For example, artificial neural
networks have been investigated thoroughly in the past three
decades and a plethora of applications has been developed
based on its modeling.
The artificial immune system is modeled after the
biological immune system, which has many useful features. (A) (B)
For example, it adapts to changes in an environment, it has
hierarchical organization, and its control is distributed. The Fig. 1. (a) Antibody structure (b) Stimulation and supression in Idiotopic
networks
biological immune system consists mainly of Lymphocytes
that have two major types: T-cells and B-cells. B-cells are As a brief of the structure of this paper, section two give an
responsible for humoral immunity that secretes antibodies. On idea about the background research that was found in the
the other hand, T-cells are responsible for cell mediated literature. From that section, moving to section three, an in
immunity. Each B-Cell has a unique structure that produces depth investigation of the interactions within a node is
suitable antibodies in response to invaders of the system. That presented. Finally, in section four, research results are
type of response is called innate immunity and eventually discussed along with some research directions for interested
results in antibody-antigen relations to be stored in case the researchers, who may want to pursue their research using the
host encounters the same invader again. In that case, the presented findings.

55 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 5, No. 12, 2014

II. BACKGROUD increase in an augmenting manner. In our proposed research,


In this research the interaction within the artificial immune we need to investigate that augmentation effect in depth to see
network is going to be investigated to understand its dynamics. the difference that it makes on the network performance. We
The main focus will be on modeling the interaction between need to find out whether or not that augmentation effect will
paratops and idiotops as well as the interactions between improve the learning process. Unpacking the network structure
paratops and epitops in the network. It is believed that these concept will help us, as well as other researchers, to
interactions govern the behavior of the network and affects its reconfigure artificial immune networks that have a more robust
learning ability. It is the interest of this proposed research to design. That design is expected to be capable of handling many
investigate the nature of that interaction and investigate how it challenges in the computational intelligence field.
affects the artificial immune network performance. To summarize, there are two assumptions investigated in
Most of the research found in the literature models the this proposed research. First, we assume that there is a
interaction within the network based on an assumption that the differentiation between P-Ep and P-I interactions in the
network has n components from the beginning of the network. Second, we assume that the interaction within the
simulation until the end. Take applications on the field of network is Augmentative and it may affect training. It is the
robotics for example. Dasgupta and Nino surveyed robotics interest of our research to investigate those hypotheses to be
research inspired by Artificial Immune Networks. Among able to build a more robust network. That network, in turn,
research done on autonomous robots, few robots utilized a should produce results that are more accurate. In the following
mechanism called Immunoid to collect various amount of section, the design part is going to be discussed to show how
garbage from a constantly changing environment [4]. Singh we are going to conduct the proposed research.
and Thayer used a computer simulation of mobile robots trying III. INVESTIGATING INTERACTIONS WITHIN THE IDIOTOPIC
to clear mines from a minefield using a response pattern based IMMUNE NETWORKS
on the immune system [5]. Lau and Ko used a general
suppression control framework that is based on suppression A. Building up the Augmentation Concept During Node
mechanism of immune cells. Their target was to design a robot Interactions
search and rescue system [6,7]. However, in the The main idea in our proposed research is based upon
aforementioned research, there was no deep investigation on assuming that the node interactions increase in an
the nature of the P-Ep and the P-I interactions in the network. augmentative manner. Table 1 shows the corresponding node
On the other hand, there were many applications that interactions at different simulation times to . Node
benefit from a few immune system features like self non-self structure including stimulation and suppression links, from
discrimination, specificity, and memory. Several research times to , are shown in figures 2-4. In this research, this
efforts got advantage of such type of networks in the field of interaction build up process is called the Augmentation Effect.
autonomous robots, e.g. [8,9]. The main idea behind their
research was having a network that adapts itself by adjusting
the concentrations of its nodes, Abs in each B-Cell, in a way
that fulfills the overall objective function of the network. Each
node in that network was a robot that has certain set of
behaviors, and collectively the network adapts robots behaviors
as they collaborate to achieve an overall common goal.
However, there was no clear distinction between the P-Ep and
the P-I interactions in the network. There was other applied
research in the literature that used featured algorithms of the
immune system such as the negative selection algorithm as
well as clonal selection algorithm [10, 11]. In addition, there
was research that utilized the pattern recognition ability of the
immune system [12, 13, 14] and many other features.
In the surveyed research efforts, all robots’ behaviors (Abs)
were assumed to exist in the networks dynamics since of the
simulation process. However, in this research, this assumption
is seen differently. It is believed that a node, B-Cell, in the
network starts at with an Ag and one corresponding antibody
Ab1. Next, at , as another antibody Ab2 is added to the Ab1
B-Cell
network, the Idiotop of Ab1 recognizes Ab2 as an Ag as well.
This means Ab2 receives stimulation from the Ag as well as
the Idiotop of Ab1. On the other hand, Ab2 suppresses the Ag
as well as the Idiotop of Ab1.
The process is repeated as a new antibody Ab3 is added to Ag
the network at . Eventually, as the size of the network
increases, the stimulation and suppression level should also Fig. 2. Node interactions at

56 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 5, No. 12, 2014

The Ag population mathematical representation is in


B-Cell general similar to the Ab abovementioned representation.
Ab1 However, in this research, the Ag is not assumed to proliferate.
Ab2 Unlike the Ab concentration equation, it is expected that as
t→∞ the Ag concentration will approach Zero ( →0). The
Ag concentration could be modeled mathematically using
Ag equation 4:
(4)
Fig. 3. Node interactions at
All the aforementioned relations and algorithms are going
to be deployed in the investigation of the relations within the
Artificial Idiotopic Immune Network. In the following section,
Ab3
we are going to present our findings regarding the interactions
B-Cell
within a node in the network as well as the interactions within
Ab1 the overall network as a whole.
Ab2
IV. RESULTS AND DISCUSSION
A. Investigating the overall number of interactions in a B-cell
Ag
node within the Network
As mentioned earlier, according to most of the research in
the surveyed literature ranging from 2005 to 2013, there was
Fig. 4. Node interactions at no differentiation between P-Ep and P-I interactions within the
network. In addition, most of the literature depicts the
B. Concentrations within the Idiotopic Network interaction within the node with all the node components
present from .
Most researchers agreed that interactions within the
Idiotopic immune network could be governed by equation 1: Based on the augmentative interaction details shown in
table 1, we find that the number of P-Ep interactions is going to
∑ ∑ (1) be n, which is the number of antibodies in the network. On the
The first term, from the left, represents the total stimulation other hand, we find that the number of P-I interactions is going
between different B-cells in the idiotopic network. The second to be ( ) assuming that there is no repetition and that order
term represents the total suppression. The third term represents does not matter, see figures 5 and 6.
the Ab stimulation received by an Ag based on the affinity As illustrated in figure 7, by combining the number of P-Ep
between them. As for the fourth term, k, it represents the interactions, IP-Ep, and the number of P-I interactions, IP-I
mortality of an Ab [15]. N represents the number of Abs, while ,within the artificial idiotopic network on one graph we can
C represents the concentration of Ab within the network, and m arrive at the following observations:
represents affinity.
1) The number of interactions, IP-Ep as well as IP-I, are
To model concentrations for antigens and antibodies,
theoretical biology should be involved. According to Boer, equal at n=3.
population growth could be described by a classical logistic 2) In addition, for n:03, IP-Ep interactions are greater
equation [16]. Therefore, in this research, antibody than IP-I interactions.
concentration could be modeled mathematically using 3) Conversely, for n>3, IP-I interactions are greater than
equations 2 and 3: IP-Ep interactions.
4) By investigating the total number of interactions, I T ,
(2) within the node, we find that:

(3)
( )
,where is the initial concentration of the Ab
population, is the carrying capacity of the population,
and r is the natural rate of increase represented as the (5)
difference between the birth rate, b, and the death rate d.

57 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 5, No. 12, 2014

In this research, it is believed that the P-I interaction


continues to have the augmentation effect until the node is
settled. In our design, it is expected that at a certain point, a
number of antibodies n* will be enough to defeat the
encountered antigen(s). At that time, there will be a steady pool
of antibodies to select from based on highest concentrations.
Later on during learning, as the learning process converges and
the target is reached, memory cells are generated for future
encounters of that antigen. That concept was investigated
further to control the interactions within a node, and eventually
deal with the augmentation effect.
It was found that it is not practical to limit the number of
generated Abs during learning, because that will limit the
chances of producing proper solution that counter the Ag
effect. One possible approach in this research was to control
Fig. 5. The number of P-Ep interactions the difference between the number of P-Ep interactions, IP-Ep,
and the number of P-I interactions, IP-I, within the artificial
idiotopic network. By doing that, the number of interactions
during learning could be controlled, and eventually that could
reduce the augmentation effect. The difference in interactions
Id, could be analyzed using equation 6:

( )

(6)
To be more precise, we can still describe the difference in
interactions for different time intervals using equation 7:

{ (7)

, which could be modeled using the following equation:


Fig. 6. The number of P-I interactions
‖ ‖ (8)
It was found that the difference is steady until t2 and then
the network reaches a balance point at n=3, where both IP-Ep
and IP-I interactions are equal. For the difference in interactions
increases in an augmentative manner, see figure 8.
As we focus on solving the augmentation problem to reach
steady state interactions, it is believed that one possible
solution is to argue that the is not increasing. Therefore,
the number of interactions is controlled. That goal could be
achieved by keeping the number of antibodies constant in the
network. In other words, the death and birth rates: d and r
should be controlled to have a steady number of antibodies all
the time, as in equation 9:
(9)

Fig. 7. Combined network interactions

58 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 5, No. 12, 2014

at the beginning. However, if we consider other B-Cells in the


overall network at the beginning of interactions, the number of
P-I & P-Ep interactions will be greater than the values
mentioned in table 1. To understand how this is actually
implemented, consider the case where the network has
multiple B-Cells. Therefore, at t1 we could have multiple Abs.
For example, there would be m Abs, where m is the number of
B-Cells in the network, and assuming that each B-Cell, node,
starts with One Ab in its interaction. At that point, there will
be more than just one P-Ep interaction to start with at t1.
B. Investigating the overall number of interactions in the
network
Based on the aforementioned findings, we concluded that
the total interactions within the node could be given by:

Fig. 8. Difference in interactions within a node


Where n is the number of antibodies in a node. Therefore,
Based on the abovementioned assumption, we assum that the summation of all the antibody interactions in m nodes in the
as n increases with values greater than 3, the difference in network could be:
interactions should increase until n reaches a steady state value
. At that point, the node is presumably stable, see figure 9 ∑ (11)
(as an example, where .

A special case occurs when each B-Cell has the same


number of Ab’s (behaviors). In that case:

(12)

‖ ‖ (13)

C. Testing the Findings on a Simple Robotics Simulation


Problem
In a general case, we should have m robots (B-Cells)
cooperating in an environment to achieve a certain goal. Each
robot is assumed to have n behaviors (Antibodies). It is
recommended to have a smaller number of behaviors to be able
to study the effect on learning. Therefore, in this simulation we
choose a simple problem of a robot that has small number of
behaviors. The reason behind that choice is to focus on the
analysis of the number of interactions that could be saved
Fig. 9. Simplified P-Ep node interactions at
during simulation. In this simulation problem, we have one
Based on figure 9, we could still describe the difference in robot. That robot has four behaviors, which are move west,
interactions within a node for different time intervals using move east, move north, and move south. Each behavior has its
equation 10: own concentration. The robot is attempting to reach a target
area autonomously by trying to minimize the distance to target
value.

(10) Using the total interactions equation in this simulation,


while choosing , resulted in a total number of 3
interactions within the network. In this problem simulation, the
{ selection of is based on choosing , which
By investigating the overall interactions within a node in means there were two selected behaviors with the highest
the immune network, we reach the following findings: concentrations from the pool of possible four behaviors.
1) Each node (B-Cell) has its own P-I & P-Ep Choosing , we saved 70% of the number of
calculations during simulation. Figure 10 shows the change in
interactions, which are calculated using different equations.
behavior concentrations during simulation until the robot
2) Only within each node, we have augmentation effect. reached the desired target location.
That counters our initial research hypothesis that we assumed

59 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 5, No. 12, 2014

design scaffolding process that we implemented in analyzing


the augmentation effect within a node resulted in several useful
equations that could describe different types of node
interactions, and eventually resulted in a more robust network.
The resulted interaction model suits different applications
that deploy intelligent behavior. It is recommend to use the
designed network model to build some applications in the in
the robotics field, for example, where robots can be used in
rescue operations. These types of applications can intervene in
hazardous situations and save many human lives.
REFERENCES
[1] Ehud Lamm , Ron Unger, Biological Computation: Chapman and
Hall/CRC Publishing, 2011.
[2] Leandro Nunes de Castro, Jonathan Timmis, Artificial Immune Systems:
A New Computational Intelligence Approach: Springer, 2002.
[3] Hongwei Mo, Handbook of Research on Artificial Immune Systems and
Fig. 10. Behavior concentrations within the IAIN
Natural Computing: Applying Complex Adaptive Technologies: Igi
Global Publisher, 2009.
By running different simulations, it was found that as [4] Dipankar Dasgupta, Luis F. Niño, Immunological Computation: Theory
increases, the percentage of total number of interactions saved, and application. NewYork, NY : CRC Press, 2009.
increases based on equation 14: [5] D. Dasgupta, “Advances in Artificial Immune Systems,” IEEE
Computational Intelligence Magazine, Vol. 1, No. 4, pp. 40-49, 2006.
(14) [6] Singh, S. P. N. and S. M. Thayer, “Immunology directed methods for
distributed robotics: A novel, immunity-based architecture for robust
, where for example, , based on the run of previous control & coordination.” The Proceedings of SPIE: Mobile Robots XVI,
simulation and . The preceding findings proved that the Vol. 4573, November 2001.
choice of out of n from the pool of possible behaviors [7] Lau, H. Y. K. Lau, and A. Ko, “An immuno robotic system for
humanitarian search and rescue.” ICARIS'07 Proceedings of the 6th
affects learning simulation time, because a few calculation international conference on Artificial immune systems, pp.191-203,
steps are saved. On the other hand, in another case, choosing 2007.
two behaviors ( ) out of a pool of three behaviors could [8] Meshref, H., "The Rod-Bearing Problem: A Cooperative Autonomous
save 50% of the number of calculations. Figure 11 shows that Robotics Application Based on Artificial Idiotopic Immune Networks,"
the percentage of calculations saved increased, and as a result, Systems, Man, and Cybernetics (SMC), 2013 IEEE International
we could have faster convergence as the learning speed Conference on , vol., no., pp.1654,1659, 2013.
increased. [9] Yang He; Liang Yiwen; Li Tao; Zhou Ting, "A model of Collaborative
Artificial Immune System," Informatics in Control, Automation and
Robotics (CAR), 2010 2nd International Asia Conference on , vol.3, no.,
pp.101,104, 6-7, 2010.
[10] Ying Tan; Guyue Mi; Yuanchun Zhu; Chao Deng, "Artificial immune
system based methods for spam filtering," Circuits and Systems
(ISCAS), 2013 IEEE International Symposium on , vol., no.,
pp.2484,2488, 19-23 May 2013.
[11] Purbasari, A.; Iping, S.S.; Santoso, O.S.; Mandala, R., "Designing
Artificial Immune System Based on Clonal Selection: Using Agent-
Based Modeling Approach," Modelling Symposium (AMS), 2013 7th
Asia , vol., no., pp.11,15, 2013.
[12] Nemmour, H.; Chibani, Y., "Off-line signature verification using
artificial immune recognition system," Electronics, Computer and
Computation (ICECCO), 2013 International Conference on , vol., no.,
pp.164,167, 7-9 Nov. 2013.
[13] Yu Xiao; Fu Dongmei; Yang Tao, "Application of artificial immune
algorithm in image segmentation based on immune field," Intelligent
Control and Automation (WCICA), 2012 10th World Congress on , vol.,
no., pp.4691,4695, 6-8 July 2012.
Fig. 11. Percentage of saved interactions
[14] Simon M. Garrett, “How Do We Evaluate Artificial Immune Systems?”
Journal of Evolutionary Computation, 13(2), pp.145-178, 2005.
V. CONCLUSIONS AND RECOMMENDATIONS
[15] Farmer, J. D., N. H. Packard and A. S. Perelson. The immune system,
Artificial Immune networks have been a reliable model for adaptation and machine learning. Physica, 22D, 187–204, 1986.
the past decade. In this research, a new trend was used based [16] Rob J. de Boer. Theoretical Biology & Bioinformatics: Utrecht
on unpacking the network ideology regarding the number of University Press, 2013.
interactions within the network. On the other hand, the careful

60 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 5, No. 12, 2014

Using Object-Relational Mapping to Create the


Distributed Databases in a Hybrid Cloud
Infrastructure
Oleg Lukyanchikov Simon Payain
Moscow state university of instrument engineering and Moscow Technological Institute
computer science, 20, Stromynka str., Moscow 38A, Leninckiy pr., Moscow, Russia

Evgeniy Pluzhnik Evgeny Nikulchev


Moscow Technological Institute Moscow Technological Institute
38A, Leninckiy pr., Moscow, Russia 38A, Leninckiy pr., Moscow, Russia

Abstract—One of the challenges currently problems in the use communication channels. Currently, in the absence of
of cloud services is the task of designing of data management developed general principles and techniques, it is the only way
systems. This is especially important for hybrid systems in which to study is to conduct experiments.
the data are located in public and private clouds. Implementation
monitoring functions querying, scheduling and processing An experimental laboratory bench to simulate operation of
software must be properly implemented and is an integral part of the hybrid cloud (Fig. 1). Some experimental results are
the system. To provide these functions is proposed to use an described in [9, 10]. The software used VMWare vCloud cloud
object-relational mapping (ORM). The article devoted to computing allows you to organize at all levels. To create a
presenting the approach of designing databases for information cloud in the experimental stand on two servers using VMware
systems hosted in a hybrid cloud infrastructure. It also provides ESXi, established management system VCenter, installed
an example of the development of ORM library. VMware vCloud Director. In the booth involved more than 15
physical Cisco switches and routers 29 Series 26 and Series 28,
Keywords—cloud database; object-relational mapping; data as well as virtual switches Nexus. The system allows you to
management; cloud services; hybrid cloud simulate routes of access to data, the convergent-divergent
I. INTRODUCTION channels (including dynamically) [13].
Name: VM1_db
Type: VM Win2008, Name: VM2_file
Advanced applications operate on big data that are in MSSQL
IP: 192.168.101.10
Type: VM Win2008
IP: 192.168.102.10 Name: FZ1_file
different stores. Rapidly evolving cloud computing and cloud Type: Win2008, MSSQL
IP: 192.168.100.10

storage data, which have advantages in performance due to VM ESXi


Win 2008

parallel computing, the use of virtualization technology, scale Name int:


Statistic&Management
Name int:
Statistic&Management
computing resources, data access via the web interface. Name:SW_st
IP:192.168.20.21 IP:192.168.20.23

Therefore, the actual task is to migrate existing systems and Type:Cisco2960


IP:192.168.20.6
Statistic
databases (DB) to the cloud. Name:SW1
Type:Cisco2960
IP:192.168.20.12

Now, many are concerned about the full advantage of cloud


services [8]. Migration of existing systems to the cloud while Name: St&M
Type: W2003, PRTG
INT: Vlan100 IP
IP:192.168.20.5
only creates problems. Security issues of access to data and addreess:192.168.100.1
INT: Vlan101 IP
addreess:192.168.101.1
QoS (Quality of Service) can be solved by using a hybrid INT: Vlan102 IP
addreess:192.168.102.1
INT: F0/1
cloud. Take a piece of data that requires large computational Vlan trunk
Name:R1
cost and is not confidential and is placed in the general (public) Type:Cisco2621
IP:192.168.20.11
cloud services, and the rest in the private (the private) cloud or INT: S0/0
IP address: 192.168.10.2
local network infrastructure. However, in this case, does not Name: Client
develop specialized design principles of cloud systems. This Type: W7
IP:192.168.20.3
Name:SW0
Type:Cisco2960 Name:R0
Type:Cisco2621
INT: S0/1
IP address: 192.168.10.1
IP:192.168.20.2
task is theoretically formulated in [2, 6]. There are solutions for IP:192.168.20.1

specific applications [4, 7, 15]. INT: F0/0


IP address:
INT: Nic1 192.168.5.1
Our research is aimed at solving the problem of the creation IP address: 192.168.5.2

of the general principles of designing effective hybrid cloud Fig. 1. Laboratory bench
systems. Complexity of building design techniques is that it is
impossible to estimate the parameters of clouds and query One of the key inputs of scientific hypotheses is to ensure
algorithms, in each case, buy different amounts of cloud the structural stability of the distributed system. For this
services, as well as unknown routes and characteristics of prompted for the positive feedback and the use of dynamic

61 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 5, No. 12, 2014

models in the state space [11, 12]. Studies on the construction implemented methods that perform select, insert, change, and
and identification of controlled dynamic models in the state delete the object.
space were conducted in a number of papers [3, 5], but not Client
widespread, as applied at the hardware level.
To implement the principles of database design must select
a technology system design, providing rapid change in the
course of the experiments, automating the processing of
distributed hybrid cloud data [14]. To provide these functions
is proposed to use an object-relational mapping (ORM). ORM
II. TECHNOLOGY INTERACTION WITH A DATABASE
With the development of the structure of the interaction of
information systems and database applications, led to the
emergence of technologies such as Open Database
Connectivity (ODBC), Data Access Object (DAO), Borland XML
Database Engine (BDE), provides a common programming
interface for working with various databases.
Further development needs of software and hardware DataBase in
Local XML file DataBase in
working with data require access to SQL does not store data, Private Cloud Public Cloud

and e-mail and directory service. To provide these features Fig. 2. Interaction with distributed repositories through the ORM
appeared technology Object Linking and Embedding, Database
(OLEDB) and ActiveX Data Objects (ADO). The advent of
powerful Frameworks, such as .Net and Qt, data processing
technologies becomes embedded in the database, providing full
integration with them, as well as integration with
semistructured data in XML, which has become a common
format for storing data in files.
However, with the development of technology for
interaction with the database, software developers generally
have to operate SQL-queries to perform data operations, and
the development of technology design complexity of queries
increases.
In the context of widespread object-oriented development
methodology and application systems at the same time a
dominant position in the market RDBMS attractive solution is
the use of middleware software that provides the necessary
object-oriented interface to data stored under the control of a Fig. 3. An example of an object-relational mapping
relational database [1]. Indeed, developer is much more
convenient to handle objects, since the code is written mainly III. DESIGNING ORM-LIBRARY FOR DISTRIBUTED
in object-oriented programming languages. DATABASE
To communicate with relational data objects with which Using middleware ORM has a large application, such as
developed software, selected technology object relational implementations QxORM, EntityFramework, Dapper and
mapping (ORM) [1] (Fig. 2). others. But all these technologies can be used effectively only
for database stored and managed only one database, because
The essence of this technology is in accordance their functionality is not enough to provide a convenient
programming entity relational database object, i.e. each field of programming interface, which provide work with all the
a table is assigned a class attribute of the object, an example of necessary data to the storage in different databases.
the essence of reflection "student" is shown in Fig. 3.
Use ORM technology allows you to automate the control
In the example shown in Fig. 3 table field «Students» location data. With a classic design, the designer must
(id_student - a unique student ID; surname - the name of the necessarily be specified in each request location data and
student; name - the name of the student; birthday - the student's software to connect to and disconnect from the database, all
date of birth; agv_sorce - GPA student) are displayed in the this leads to an increase in the complexity of the design and
appropriate class attributes «Student». After this reflection in appearance of errors in the code. ORM makes it possible to
the lens incorporates data processing methods. Thus, include in each entity attribute that is responsible for the
programmers using an object, there is no need to build complex physical location of data in a distributed system. As an
structures SQL, including addressing distributed data, it example, the attribute is marked in red in Fig. 4.

62 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 5, No. 12, 2014

Classes «QueryOptions» provide storage requests in a


special structure, as well as the generation of the necessary
inquiries with the sample filters and sorting. These classes are
used classes «DbEntity», «DbEntityLink», «DbEntityView» to
generate queries.

Fig. 4. Example of object-relational mapping attribute location

When sampling data operations in this attribute is


automatically set data storage facility, and the operations of
adding the attribute must be set by the programmer. Therefore,
the development of relevant technology ORM for the hybrid
cloud database is the development of an intelligent terminal,
determining the optimum storage of data that will improve
system performance. The appropriate database storage place
should automatically determine, based on many criteria, such
as channel bandwidth, server load, number of clients, and
others. Many of these parameters can be obtained by Fig. 5. Structure of the ORM library
experimentation, so the intelligent control module storing data
to be adapted on the basis of data collected from the system
during the trial operation.
Based on the features of object-relational mapping data and
features for distributed databases, solving the problem of
distributed database, the following classes that implement the
ORM library, which are shown in Fig. 5.
Classes «DbEntity» provide, through inheritance from a
class «BasicDbEntity», a reflection of one entity object, as
shown in Fig. 6. This class contains all the necessary methods
for working with data, namely the sample, add, change and Fig. 6. Example of the display of one entity with no links
delete.
Classes «DbEntityView» provide reflection entities
associated with communication "many to many" and "one to
many" expense inheritance from «BasicDbEntityView», as
shown in Fig. 8. These classes monitor data integrity for all
communications.
Classes «DbProcess» provide a connection to the database
and querying. Often, even in the design, using ORM, does not
require SQL, does not exclude the case when you want to
perform a specific request to the database. The same data
classes used classes «DbEntity», «DbEntityLink»,
«DbEntityView» for direct queries.
Fig. 7. Example of reflection with a link "one too many"

63 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 5, No. 12, 2014

[4] Jurczyk P., Xiong L. “Dynamic Query Processing forp2p Data Services
in the Cloud. Database and ExpertSystems Applications”, Springer,
Berlin, Heidelberg, 2009, pp. 396-411.
[5] Lemmon M. D. “Towards a passivity framework for power control and
response time management in cloud computing”. Proceedings of 7th
International Workshop on Feedback Computing, 2012.
[6] Mahmood Z. (ed.) “Cloud Computing. Challenges, Limitations and
R&D Solutions.” Springer International Publishing, 2014.
[7] Nathuji R., Kansal A., Ghaffarkhah A. “Q-clouds: managing
performance interference effects for QoS–aware clouds.” Proc. of the
5th European conference on Computer system, 2010, pp. 237–250.
[8] Pandey S., Voorsluys W., Niu S., Khandoker A., Buyya R. “An
autonomic cloud environment for hosting ECG data analysis services.”
Future Generation Computer Systems, 2012, vol. 28, no. 1, pp. 147–154.
[9] Pluzhnik E. V., Nikulchev E.V. “Use of Dynamical Systems Modeling
to Hybrid Cloud Database.” Int'l J. of Communications, Network and
Fig. 8. Example of reflection with this "one to many" and "many to many" System Sciences, 2013, vol. 6, no. 12, pp. 505–512. (doi:
10.4236/ijcns.2013.612054)
IV. CONCLUSION [10] Pluzhnik E., Nikulchev E. “Virtual laboratories in cloud infrastructure of
educational institutions”. 2nd International Conference on Emission
As a result of the use of programming technology object- Electronics (ICEE). Selected papers; 2014, pp. 67–69. (doi:
relational mapping data it is possible to implement control the 10.1109/Emission.2014.6893972 )
location and integrity of the data, to automate the development [11] Pluzhnik E., Nikulchev E., Payain S. “Concept of Feedback in Future
of information systems and hybrid cloud infrastructure. Computing Models to Cloud Systems.” World Applied Sciences Journal,
2014, vol. 32, no. 7, pp. 1394–1399. (doi:
Designing systems ORM-systems based on the use of the 10.5829/idosi.wasj.2014.32.07.588)
principle of inheritance of objects allow you to make changes [12] Pluzhnik E., Nikulchev E., Payain S. “Optimal control of applications
to any of the methods of an object without changing the system for hybrid cloud services.” Proc. 2014 IEEE 10th World Congress on
architecture and full parsing code. Services (SERVICES 2014). Anchorage, USA, 2014, pp. 458–461.(doi:
10.1109/SERVICES.2014.88)
REFERENCES [13] Pluzhnik E.V., Nikulchev E.V., Payain S.V. “Laboratory test bench
[1] Ambler S. W. “Mapping Objects to Relational Databases: O/R Mapping cloud and network technologies.” Cloud of Science, 2014, vol. 1, no. 1,
In Detail”, Ambysoft Inc., 2010. pp. 78–87. (http://cloudofscience.ru/sites/default/files/pdf/CloudOf
[2] Buyya R., Calheiros R. N., Li X. “Autonomic cloud computing: Open Science01017887.pdf)
challenges and architectural elements”. 2012 Third International [14] Pluzhnik E.V., Nikulchev E.V., Payain S.V., Lukyanchikov O. I.
Conference on Emerging Applications of Information Technology “Design of distributed systems in a hybrid cloud infrastructure.” Int.
(EAIT), 2012, pp. 3–10. (doi: 10.1109/EAIT.2012.6407847) conf. Engineering & Teltcom – En&T 2014. Book of abst., 2014,
[3] Hellerstein J. L. "Challenges in control engineering of computing pp. 212–214.
systems." Proceedings of the 2004 American Control Conference, 2004, [15] Sithole E., McConnell A. et al. “Cache Performance Models for Quality
vol. 3, pp.1970–1979. of Service Compliance in Storage Clouds.” Journal of Cloud
Computing: Advances, Systems and Applications, 2013, vol. 2, no. 1,
pp. 1–24.

64 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 5, No. 12, 2014

Definition of Tactile Interactions for a Multi-Criteria


Selection in a Virtual World
Robin Vivian David Bertolo Jérôme Dinet
University of Lorraine Laboratory University of Lorraine Laboratory University of Lorraine Laboratory
PErSEUs Metz, France LCOMS Metz, France PErSEUs Metz, France

Abstract—Tablets, smartphones are becoming increasingly direct touch on an object. Select a group of objects can be
common and interfaces are predominantly tactile and often done either by increasing touches or by drawing an area
multi-touch. More and more schools are testing them with their around a group of objects (the lasso technic or selection box).
pupils in the hope of bringing pedagogic benefits. With this new Both approaches seem to meet all the needs and few works
type of devices, new interactions become possible. A lot of studies propose solutions for more complex designations. For
have been done on the manipulation of 3D objects with 2D input example, what grammar of gestures to select all objects with
devices but we are just at the beginning of studies that made a the same geometric shape? What grammar of gestures to select
link between needs of pedagogy and possibilities of these new objects with the same color? This paper proposes to explore
types of interactions. FINGERS© is an application for learning
innovative and intuitive solutions to provide simple solutions
spatial geometry. It’s written for pupils from 9 to 12 years old.
Interactions have been designed with teachers. Some interactions
to a problem that can be complex.
are specifics for 3D geometry (3 DOF translations, rotations, nets, The paper proposes to use the characteristics of objects to
combinations of cubes, etc) and someone are general like answer these questions. An object has different parameters as
designation or multi-selection. A lot of grammars of gesture a geometric shape, color and others. By identifying designated
propose a set of interactions to select an object or a group of objects, it’s possible to identify common features and deduct
objects. Multi-taps or lasso around an area are commonly User’s intention.
adopted interactions. Performing geometry exercises needs
imaging another interactions. For example how selecting all II. RELATED WORKS
Cubes, how selecting all green objects. The real question is how
introduce a parameter in selection. After presenting the limits of Multi-touch surface computing provides a set for
current solutions, this communication presents the solutions interactions that are closer analogues to physical interactions of
developed in FINGERS©. We explain how they allow a windowed interfaces. Building natural and intuitive gestures is
“parameterized” selection. sometime a difficult problem. Sometimes the gesture is not
natural. How to define a gesture to move an object in virtual
Keywords—tactile Surface; tablets; gestures; cognitive; human- world along 3 directions with only one hand? [2] or two-
centred design; iPad handed [12]. When you have only one hand to point, to move
and to turn an object, your possibilities to interact with is entity
ACM Classification Keywords—H.5.2. Information interfaces
are poor. In 2009 [14] Moscovich shows how to design
and presentation: User Interfaces – Interaction styles; touchscreen widgets that respond to a finger’s contact area and
evaluation/methodology; user-centered design gives limitations on the design of interactions based on sliding
I. INTRODUCTION Widgets. Recreating new interactions (grammar of gestures)
became a necessity. In 2008 [17] Schmidt and Al. present an
The commercial success of tablets requires researchers in interface for 3D object manipulation in which standard
human-computer interaction to imagine new ways to interact transformation tools are replaced with transient 3D widgets
with these devices. Today, someone use it to individual way invoked by sketching context dependent strokes. The majority
[19], in groups [12], as part of multi-display environments [8], of works try to define a quantitative and qualitative surface
and for fun and entertainment [20]. These devices provide gesture used by users. Understanding users’ mental model will
interaction techniques that are often intuitive and easy to use improve a better knowledge of relationships between
in 2D. However, manipulation of objects in 3D is still a technology and users. In 2002 Poggi [16] build a four
challenge. Manipulations in 2D have been defined very simply dimensions topology, where gestures differ. These four
(selection, moving, designation). Things become more categories are: mapping of meaning, semantic content,
complex when it’s necessary to perform manipulations in spontaneity and relation to speech. From an analyse of people
space. In 3D environment, widgets [13] have been largely used collaborating around a drawing table, Tang [18] observe that
to make 3D manipulation easier. Moscovich [14] had shown gestures appear as an element of simulation for operations,
how to design touchscreen widgets that respond to a finger’s referring to an area of interest in connection with users.
contact area. Schmidt et al. [17] have presented an interface
for 3D object manipulation in which standard transformation Some studies focus more specifically for handling objects
tools are replaced with transient 3D widgets invoked by in a workspace in 3D. Gestures are more complex and less
sketching context-dependent strokes. Designate an object in intuitive. It is common to incorporate the users to define the
2D or 3D is relatively simple. A user would naturally choose a input systems and mainly grammars gestures. Cohé and Hachet

65 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 5, No. 12, 2014

[6] conducted a user study to better understand how non- From a short preliminary study, Cohé and al. [5], show that
technical users interact with a 3D object from touch- screen selection of the DOF controls is difficult as soon as the
inputs. The experiment has been conducted while users graphical elements project close to each other on the screen.
manipulated a 3D cube with three points of view for rotations, They note that it is difficult to control all the DOF when they
scaling and translations (RST). Their study shows a wide are displayed at the same time. They propose an alternative
disparity for gestures suggested by users. Figure 1 illustrates approach and built a tBox controlled with a finger (Figure 3).
this disparity for rotation.

Fig. 3. A tBox to control 9 DOF


Fig. 1. Categorization of gesture for rotation around X axis
User-centered design is a way of designing human-
computer interfaces. But you have a gap between users and
The most suggested gesture by users embodies only 17.9%
designers. Users behavior are often complex to develop and
of proposals. The sixth gesture gets even 8.6% of suggestions.
often inefficient for design. Foley et al. [7] observe that a user-
In 1986, Bier [4] introduces two classes (two widgets):
computer dialogue is at the beginning of all languages of inputs
“anchors” and “end condition” to precise placement of shapes
and outputs. As in speech recognition, a feedback is inevitable
relative to each other. Since this first definition and tactile
to developing an exchange between two entities (humans or
interface building, widgets used in 3D manipulation are in
not).
rapid succession. When you use your finger to point, to move
and to turn, your possibilities to interact with an entity are In its work on user-defined gesture, Wobbrock [21] trying
reduced. Some studies explore multi-touch controls to to control this feedback to prevent revision by user of his of
manipulate several degrees-of-freedom at the same time. mental model. They perform gestures on tactile table
Hanckock and al. [9][10] proposed to use from one to three (Microsoft table). Participants performing a gesture to pan a
fingers to handle objects in shallow depth. Martinet, Casiez and field of objects after a learning animation. The initial
Grisoni [12] explored the design of free 3D positioning hypothesis is that any action or command cannot be performed
techniques for multi-touch displays to exploit the additional by a gesture. “So what is the right number of gestures to
degrees of freedom provided by multi-touch technology. Theirs employ?” He developed au field experience with 20
contributions are twofold: in a first time an interaction participants. They presented them, like Cohé and Hachet [6], a
technique to extend the standard four viewports technique set of 27 commands and they asked then to imagine
found in commercial CAD applications by adding a corresponding gesture.
teleportation system, and in a second they introduce a
technique designed to allow free 3D positioning with a single
view of the scene: The Z-technique (Fig 2).

Fig. 2. Z-technique and multi-touch viewport technique Fig. 4. Objects used for Wobbrock’s experiment

Table 1 shows these 27 commands.

66 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 5, No. 12, 2014

TABLE I. THE 27 COMMANDS FOR WHICH PARTICIPANTS CHOSE More generaly, how selecting objects by providing
GESTURES. MEAN: 5 POINTS LIKERT SCALES (1=SIMPLE 5=COMPLEX)
parameters like form, color or representation. According to
teachers of primary school, FINGERS© propose different
categories of selection. After presenting FINGERS© and its
functionalities we describe the solutions adopted for the
selections of objects.
IV. FINGERS APPLICATION
FINGERS© (Find INteractions for Geometry learNERS) is
an application on tactile tablets that help young students to
learn geometry in 3D space [1][2][3]. Study is restricted to
mobile devices like iPad (This tablet is present in a large
number of schools in France). The main goal is to manipulate
a solid accurately even if one. Moreover the scene can
contained several mathematical objects like Cube, Sphere,
Pyramid and parallelogram. To permit the largest possibilities
of manipulation, each solid had to be independently
manipulated as well as the entire scene. To test potential
pedagogic benefits of our set of interactions, a prototype is
implemented with different functionalities:
From this analysis we extracted two specific items related
to the selection of objects (Line 3 and 12). The 27 participants A. Creation and suppression of solids
consider designating an object is a simple action. Make a Creating or deleting a solid, FINGERS use tangible solids
multiple selection is already considered more complex. In or eraser (Fig 6). Putting a solid on the screen incorporate a
addition, the multiple-selection was basic. Participants were virtual solid under the tangible object. Putting a tangible eraser,
just designate several patterns; selection was not constrained by like a rubber, on a virtual solid delete it. You can also delete
specific characteristics such as colors or shapes. an object by sliding to the edge of the tablet.
III. CRITIQUE OF ACTUAL SOLUTIONS
With a small quantity of objects or in specific situations the
problem of multi-selections is always simple. Gesture created
to solve this problem are easy to understand, simple to realize
and very efficient. The two main options using a lasso or a
designation by multi-touch (Figure 5).

Fig. 6. A tangible cube used to create a virtual cube in the prototype.

B. Selection
A long press on a solid makes its reference system appear
(Fig. 8a). One more time and selectable vertices appear (Fig.
9). Another long touch and you draw edges (Fig 8b). Another
long touch and wire mode appears. The selection system is
cyclic (Fig 8c).
C. Translation
FINGERS© permits to translate an objet along screen plan
in indirect mode (a touch out of object) if the solid is selected,
or direct mode (a touch on objet and move along screen). If
you add a second touch (1 touch + 1 touch different from 2
Fig. 5. Wobbrock’s propositions for Select Single or Select Group synchronized contacts) you have possibility to control object
along z-axis (the gesture is similar to zoom).
These gestures are often presented as efficient and affordant
solutions but they are not fitted when the complexity increases. The initial distance between the two fingers is recorded as
They does not answer to the following questions: reference distance. Depth translation is performed by the
variation of the distance between the two contacts. If the two
 How selecting all the squares / cubes / circle? contacts move with a constant distance the solid moves along
the screen plane. Moving the two contacts with a variation of
 How selecting all red objects? the distance between them in the same time permits to manage
 How selecting all wired objects? 3DOF.

67 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 5, No. 12, 2014

D. Duplication
Duplication is managed by a three-way interaction contacts.
You just put one direct contact to designate the solid and two
indirect contacts to indicate the position where the duplicated
solid appears. If the two indirect contacts begin closed to the
solid, the duplication is restricted along the –x or x-axes, the –y
or y-axes. You have possibility using one touch on object and
two contacts anywhere on screen plan for positioning a free
c) Wire representation
duplication.
Fig. 8. Cyclic action on object
E. Manipulation of a net of polyhedron
To generate and manipulate a net of polyhedron the solid Rotations in the reference system of the screen use two
need to be selected. Two hands are using for interaction with fingers to rotate the object around the axes. According to
four indirect contacts. By moving two fingers by hand in an Nacenta [15] a magnitude filtering technique is used to
opposite direction (Fig. 7) we fold/unfold a net of the minimize non-wanted rotations. Rotations in the reference
polyhedron. When a net is open, a double tap places the net on system of the object are constrained by a defined axis. When a
the screen plane to be modified. A direct contact on a face solid is selected its reference system appears (Fig. 8a). A touch
permits to move it. Pupils can check the validity of the new on the sphere or the cone of each axis selects it. The solid can
resulting net by folding it. Also they can experiment different only rotate around the selected axis. Rotations in the reference
possibilities and visually check them. A net of polyhedron can system of the scene are all the others rotations where axis is
be manipulated in the same way than a solid. defined by two vertices of the solid. The main problem is to
define the axis of rotation. To solve the problem two states of
selection are introduced. Our selection system is cyclic. A one-
second long press on a solid makes its reference system appear.
One more second makes selectable vertices appear. A rotation
axis is defined by selecting two vertices (Fig. 9). A two fingers
slide rotates the solid.

Fig. 7. Unfold a cube and modify its net.

F. Rotation
All rotation interactions from our set are indirect. Rotations
are mapped with two fingers interactions. A solid must be
selected to be rotated. Rotations-interactions are categorized in
three classifications depending on the reference system. The Fig. 9. Axis of rotation defined by two vertices
three reference systems employed are the screen frame, the
object frame centered on the object and the scene frame (Fig. 8 G. Changing position of observer
a). We went in schools to observe 3D geometry lessons.
Pupils turned around a real model of the exercise to verify
their results. This interaction was so intuitive that this
functionality is our solution to turn around the scene.

a) Reference system of the solid (3 axes)

b) Edges representation

68 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 5, No. 12, 2014

A one–second long press with one finger on each side of very efficient. According to Kammer [11] we define the syntax
the tablet begins or stops moving the observer. A new of our grammar gesture that is an extended Backus Naur Form.
background color gives a visual feedback. The gyroscopic A small proportion of this grammar used for selection is:
sensor is used to modify observer’s position around the scene.
It’s the video camera metaphor. The tablet acts as a window   for a long touch
onto the scene and moving the tablet in space changes the   for a tap
viewpoint into the scene (Fig. 10).
 + Two gestures performed in a asynchronous manner
 * Two gestures performed in a synchronous manner
An action of multiple selections can be written:
Multi::=(Initial object) +  (Destination object).

Fig. 10. Gyroscopic sensor and video camera metaphor to change the position
of observer

In addition to conventional interactions, some functions of


selections are incorporate into FINGERS© for a single object
or group of objects. The problem was to build a simple
grammar of gesture to select objects with different criteria.
V. PROPOSITIONS FOR MULTI-SELECTION
Figure 11 shows a simple example of problematic. Imagine
that user wish selecting all green cubes or all parallelograms of
different colors (green, red, yellow and blue). Using Lasso
technic or selecting one by one objects is difficult indeed
impossible. It’s possible also argue by subtraction. How to
select all objects except the pyramids or all green cubes except Fig. 12. First action, selecting an object with FINGERS©
the one in bottom left corner?
A. Selecting all same objects
For example, to select all blue cubes you perform a long
touch on initial object and a tap on destination object. Initial
and final objects have same form, same color, same
representation (plain or wire) and different position in space.
FINGERS© understands that selection must be realize for all
green cubes and select them with only one gesture (Fig 13).

Fig. 11. Example of situation containing multiple objects in FINGERS©

The generalization of problem is how to introduce a


parameter like number, color, form, representation indeed
position in a tactile selection query.
In FINGERS© application, a long touch is used to select an
object. Simple tap is not used like a principal action on an
object. The easiest way was using these interactions to build an Fig. 13. Gesture to select all blue cubes
action of multiple selections. The consistency is maintained
with the other actions, the interaction is easy to remember and

69 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 5, No. 12, 2014

B. Selecting all same forms


To select objects with the same shape (not necessarily the
same color) you must apply the same interaction of selection on
the two objects (Long touch on one and a tap on other one).
Figure 14 shows for example how to select all colored cubes.

Fig. 16. Gesture to select all objects

E. Unselect one object from selection


Sometime it’s necessary to realize an incomplete selection.
For example, if user need to delete all green cubes except the
Fig. 14. : Gesture to select all cubes cube localize at the left-down corner. Intuitive step is:
 Selection all objects
C. Selecting all same colors
Similarly it’s possible selecting all objects with the same  Unselect one cube (left-down corner)
color (whatever their form or modeling). Figure 15 shows the  Delete all selected cubes.
selection of red objects.
From a selection mode, an object has two states. In one
hand a state “selected” in other hand an “unselected” state (it’s
a binary state).
(blue cube) +  (blue cube): It’s interaction to do to
select all objects. This final state is represented figure 17 by a
symbolic red cross on all green cubes.

Fig. 15. Gesture to select all red objects

D. Selecting all objects


There is still a function that is the selection of all objects.
When the action involves two unrelated objects, interaction
must be applied for all objects of 3D space. A long touch on
object and tap on another one (having no common features) Fig. 17. All blue cubes selected
will select all elements (Figure 16).

70 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 5, No. 12, 2014

 (Blue cube) : a tap on a selected cube change his state solution is integrated soon in our FINGERS© and tested in
only. All similar objects staying selected and user have actual software.
possibility manipulating them in only one interaction. REFERENCES
[1] Bertolo D., Vivian R. and Dinet J, “Proposition and evaluation for a
categorization of interactions in 3D geometry learning context”,
International Journal of Advanced Computer Science, Vol. 3, n°12,
December 2013, pp. 1-8.
[2] Bertolo D., Vivian R. and Dinet J, “A set of interactions to rotate solids
in 3D geometry context”, CHI’13 Extended Abstracts, ACM SIGCHI
Conference on Human Factors in Computing Systems Proceedings,
2013, April 27–May 2, Paris, France.
[3] Bertolo D., Vivian R. and Dinet J, “A set of interactions to help to
resiolve 3D geometry problems”, Science And Information Conference
2013, October 7-9, 2013, London, UK.
[4] Bier E. A. “Skitters and jacks: interactive 3d positioning tools,” in I3D
’86: Proceedings of the 1986 workshop on Interactive 3D graphics,.
ACM, 1987, pp.183–196
[5] Cohé A., Décle F. and Hachet M. tBox: A 3D Transformation Widget
designed for Touch-screens CHI 2011 Canada 3005-3008.
[6] Cohé A. Hachet M. Understanding User Gestures for manipulating 3D
objects from touchscreen inputs Graphics Interface Conference Toronto
2012
Fig. 18. Only one cube unselected [7] Foley, J.D., van Dam, A., Feiner, S.K. and Hughes, J.F. (1996) The form
and content of user-computer dialogues. In Computer Graphics:
A long touch on empty space unselecting all objects (Figure Principles and Practice. Reading, MA: Addison-Wesley, 392-395.
19). [8] Forlines, C., Esenther, A., Shen, C., Wigdor, D. and Ryall, K. (2006)
Multi-user, multi-display interaction with a single-user, single-display
geospatial application. Proc. UIST '06. New York: ACM Press, 273-276.
[9] Hancock M., Ten Cate T., and Carpendale S. Stickytools: Full 6DOF
force-based interaction for multi-touch tables. In Proc. ITS, pages 145–
152, 2009.
[10] Hancock M., Carpendale S., Vernier F. Wigdor D, Shen C. Rotation and
translation in Proceedings of the SIGGHI conference on human-
computers system pp 79-88 206
[11] Kammer D., Wojdziak J., Keck M., Taranko S. Towords a formalization
of multi-touch gesture ITS’10 novembre 7-10 2010 Saarbrücken.
[12] Martinet A., Casiez G., and Grisoni L. The design and evaluation of 3d
positioning techniques for multi-touch displays. In 3D User Interfaces,
2010 IEEE Symposium, 115 –118
[13] Morris, M.R., Huang, A., Paepcke, A. and Winograd, T. (2006)
Cooperative gestures: Multi-user gestural interactions for co-located
groupware. Proc. CHI '06. New York: ACM Press, 1201-1210.
[14] Moscovich T., “Contact area interaction with sliding widgets,” in UIST
’09: Proceedings of the 22nd annual ACM symposium on User interface
software and technology ACM, 2009, pp.13–22
Fig. 19. Back to start situation [15] Nacenta M.A., Baudisch P., Benko H. and Wilson A., “Separability of
special manipulations in multi-touch interfaces,” in Proceedings of
Graphics Interface 2009, 2009, pp.175-182.
VI. CONCLUSION [16] Poggi, I. (2002) From a typology of gestures to a procedure for gesture
With the development of tactile devices, software designer production. Int'l Gesture Workshop 2001, LNCS vol. 2298. Heidelberg:
has imagined interactions to perform complex tasks. Springer-Verlag, 158-168.
Sometimes, actions deemed simple have not been a particular [17] Schmidt R., Singh K., and Balakrishnan R., “Sketching and composing
widgets for 3d manipulation,” Computer Graphics Forum, Proceedings
attention "cuius rei demonstrationem mirabilem sane detexi. of Eurographics 2008, 2008, pp.301-310.
Hanc Marginis exiguitas not caperet" (Fermat’s last theorem). [18] Tang, J.C. (1991) Findings from observational studies of collaborative
The problem of selecting multiple items is a good example. work. Int'l J. Man-Machine Studies 34 (2), 143-160.
This paper shows that the problem may be more complex and [19] Wellner, P. (1993) Interacting with paper on the DigitalDesk.
the solutions developed are insufficient. Our proposition is a Communications of the ACM 36 (7), 87-96.
simple and effective approach based on two common gestures a [20] Wilson, A.D. (2005) PlayAnywhere: A compact interactive tabletop
long touch and a tap. The proposed actions are intuitive. A user projection-vision system. Proc. UIST '05. New York: ACM Press, 83-92.
can select groups of objects by varying different criteria. Our [21] Wobbrock J. Ringel Morris M. and Wilson A. : User-defined gesture for
selection mode allows designation with three different criteria surface computing. CHI 2009 Boston.
(shape, representation, color). Without common criteria, our
gesture will be interpreted as a selection of all the elements. It’s
possible to imagine other criteria such as position, orientation
or size by changing only a small part of interaction. This

71 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 5, No. 12, 2014

Natural Language Processing and its Use in


Education
Dr. Khaled M. Alhawiti
Computer Science Department, Faculty of Computers and Information technology
Tabuk University, Tabuk, Saudi Arabia

Abstract—Natural Language Processing (NLP) is an effective provide sufficient details to allow the work to be reproduced.
approach for bringing improvement in educational setting. Results provide detailed analysis of the data collected from the
Implementing NLP involves initiating the process of learning study. Discussion part of the study is comprised of detailed
through the natural acquisition in the educational systems. It is analysis and valuation of the results based on the data obtained
based on effective approaches for providing a solution for various from the research study. The last section of the study is
problems and issues in education. Natural Language Processing conclusion, which is comprised of summary of the study and
provides solution in a variety of different fields associated with useful implementations and recommendations for further
the social and cultural context of language learning. It is an research.
effective approach for teachers, students, authors and educators
for providing assistance for writing, analysis, and assessment B. BackGround of the Study
procedures. Natural Language Processing is widely integrated
Natural language processing NLP is a major factor
with the large number of educational contexts such as research,
associated with the branch of science, which focus on the
science, linguistics, e-learning, evaluations system, and
contributes resulting positive outcomes in other educational
development and improvement in the process of learning. NLP
settings such as schools, higher education system, and provides theoretical grounds to assist in the process of
universities. The paper aims to address the process of natural developing techniques and effective approaches for providing
language learning and its implication in the educational settings. assistance in the scientific learning by utilizing the effective
The study also highlights how NLP can be utilized with scientific theories and approaches. NLP can be effectively applied in the
computer programs to enhance the process of education. The education for promoting the language learning and enhancing
study follows qualitative approach. Data is collected from the the academic performance of the students. [1] Natural language
secondary resources in order to identify problems faced by the processing assists in developing effective process of learning in
teachers and students for understanding the context due to the educational setting by developing scientific approaches,
obstacles of language. Results provide effectiveness of linguistic which can assist in the process of using computer and internet
tools such as grammar, syntax, and textual patterns that are for improvement the learning. In order to provide assistance,
fairly productive in educational context for learning and there are a number of different computer programs and
assessment. effective language learning approaches to ensure that students
can easily develop understanding of education in the natural
Keywords—Natural Language Processing; education; settings. This is based on utilizing the effective and efficient
application; e-learning; scientific studies; educational system language learning process in the natural settings [1]. NLP
I. INTRODUCTION utilize the natural language process and utilize this process for
developing effective approaches in order to bring improvement
Natural language process is an effective process to assist in the educational settings.
students in the process of scientific learning. Implementing
NLP in the educational setting not only helps in developing The approach in NLP is more focused on developing
effective language process, but it is also significant for educational software systems and educational strategies that
enhancing the academic performance. The NLP techniques can assist in utilizing the natural languages for education, for
follow the approach of the natural process of language example, e-rater and Text Adaptor [2]. The software systems
acquisition integrated with the scientific approach of using with the NLP have the ability to identify the process of
computer programs. language learning in natural settings.
A. Outline of the Study The Natural language processing is also an effective
approach for developing an efficient system of managing
This study is based on “Application of Natural Language
linguistic input in the natural settings through various words,
Processing in Education”. The first section of this study will
sentences, and texts. The Natural Language processing also use
provide an “introduction” to the topic that discuss and define
various grammatical rules and linguistic approaches such as
the background of natural language learning process. This part
derivations, infections, grammar tenses, semantic system,
will also focus on the aims and objectives of the study. The
lexicon, corpus, morphemes, tenses etc. All these effective
second sections contain materials discussing the Natural
approaches can be applied in the educational settings in order
language processing Arabic language and its implementation in
to ensure that students can develop better understanding of the
the educational framework. Next section is comprised of
educational material and curriculum.
procedure and method of the study, Material and methods

72 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 5, No. 12, 2014

Natural Language Processing is a widely recognized area in approaches. NLP is also an effective technique for assessment
the language learning all over the world. It is successfully process to enhance the ability of students to identify the
implemented in different languages as an effective way for relationships of different words and the use of such words in
bringing improvement in the educational systems. English is the search engine for generating treasure [5].Therefore; it is an
the most commonly recognized language in most of the effective approach, which allows learners and teachers to use
researched studies, which reveals it’s effective of utilizing these words more efficiently. The assessment procedure
natural language learning process in educational process. NLP requires entering correct information in the text in order to
is also effective approach for bringing improvement in the enter in the next level. NLP assessment allows analysis of the
educational system in Arab countries [3]. Despite that there are students’ information by matching it with the requirement of
various approaches for bringing improvement in the social and the content [6].
educational settings; NLP is the best suitable approach, in
which natural language processing use to create NLP tools to B. Tools and Methods
promote the education. These tools are based on utilizing Having access to a variety of tools, NLP enjoys the
various effective approaches for assisting in the process of implementation in multiple fields such as laboratory, faculty of
education at college and university level. This requires information, e-learning, and in the field of education.
developing the tools and corpus resources for the educational Availability of search engines also provide sufficient
system in Arabic language. information for information search, but language constraints
are a major issue for majority of students, which hinders the
C. Aims and Objectives process of language learning through electronic sources and
The major aims and objectives of this study is to online material available on internet. The natural language
learning processing is also associated with the understanding of
 Understand the natural language learning processing. various effective linguistic tools such as grammatical
 The objective of this study is to apply the natural construction, syntax, sentence composition, etc [7].
language processing NLP in educational setting. The application of NLP in the area of e-learning is an
II. MATERIAL AND METHODS effective approach, especially its application in the area of
educations. The natural language processing can assist students
This study is based on qualitative approach. Method of data in developing general understanding of the cognitive and
collection is based on gathering information from the psychological perspectives that play an important role in the
secondary resources and analysis of the theories, which support language acquisition. In the educational process, natural
and assist in understanding the natural language process and its language processing can be implemented effectively to ensure
implementation in the education. Various problems faced by that there are various positive attributes of this approach such
the teachers and students for understanding the context due to as specification of the synchronous or asynchronous mode. [8]
obstacles of language. The use of effective linguistic tools such
as grammar, syntax, and textual patterns are very effective for The methods for the implementation of NLP in education
learning and assessment of text. The study is based on a requires using the approach of e-learning or the utilization of
qualitative approach. Method of data collection is based on teaching material for bringing improvement for further
gathering information from the secondary resources and development [8,9].The method or the process of this study also
analysis of the theories, which support and assist in requires implementation of the effective approaches and
understanding the natural language process and its utilization of the language resources for bringing improvement
implementation in the education. Various problems faced by in the system of education. There are also a large number of
the teachers and students for understanding the context due to tools and methods, which assist in utilization of language
obstacles of language. The use of effective linguistic tools such technology such as linguistics software systems, which are very
as grammar, syntax, and textual patterns are very effective for effective for managing and dealing with the need to bring
learning and assessment of text. [4] improvement in the educational system [9].

A. NLP and Educational Setting The implementation of language tools can assist in
developing better understanding of the content such as better
There are a number of different effective approaches in the understanding of material while reading, development of the
NLP, which assist in educational settings such as role of reading text and material. There are two sides of the
empirical data, corpora, and other such linguistic aspects, educational system: learning and teaching. Thus, tools for
which are essential and effective for the process of language teaching and learning are both effective for assisting the
learning. Corpora are very effective, which provides a large process of educations such as websites, publications, digital
number of computational data for spoken and written language. libraries, e-books, podcasts, and scientific materials [10]. This
For example, in British English, BNC (the British National is one of the most effective approaches, which allows students
Corpus) provides a large data about the vocabulary usage [4,5]. and teachers to focus and to explore more in the field of study.
The large collection of information provides sufficient data There are various effective methods to implement NLP in the
regarding the usage of words, which assist enhancing the educational settings such as classification and categorization of
information and academic skills of the students. different sources in the learning perspectives. This can assist in
There are various effective approaches, which are effective identification of the authentic sources and avoiding the use of
for managing patterns of grammar and other linguistic unreliable resources.

73 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 5, No. 12, 2014

Another method to assist language learners is through the may be identified in classroom texts that may be unfamiliar to
process of enabling students to concentrate on their course ELLs. Automated Generation of test items is another
material and content of the given topic. This approach is based application towards the use of Natural Language Processing.
on matching the course content knowledge of learners; this [14]
method is derived from the assessment procedure of NLP. For
example, students can be asked to write an essay, which B. Natural Language Processing and Edcation
matches their curse content with the current information Natural Language processing has various applications for
gathered from the online source. [11] educational purpose. It is very significant to develop new
software systems and advanced techniques in the educational
C. Theory settings. The major purpose of using NLP in educational setting
Reference [2] discussed emerging opportunities for is to bring improvement in the educational system by
bringing improvement in the natural language process NLP and implementing efficient and effective policies, which can assist
its effectiveness for developing various educational tools such in utilizing advance technologies for the bringing improvement
as for reading and writing content. According to Reference [2] in the educational system. For example, application of NLP in
NLP provides theoretical grounds and practical education for e-learning is very significant approach, which
implementations for the technology based computer systems. assist in producing educational material with technological
Advancement of computer technology and the increase in development. Another significance of NLP application is the
approach of second language learning has resulted in the trend participation of both teachers and students. There are a number
of using language as a major tool for providing assistance in of various electronic, online sources available in English
the educational settings. According to Burstein, “the general language, which assists students and teachers to access
lack of computer based technology proved to be an obstacle materials. Apart from the convenience of availability of large
early on” [2].NLP allows the assessment of readability and text number of online resources, another major concern is
quality by analyzing the text quality especially n terms of the associated with the increase in use of blogs, Wikipedia, and
text analysis. unreliable resources. This requires intelligent automatic
processing for preventing the use of such unreliable resources
The linguistics aspects in the educational context can assist and promoting the use of authentic resources. Application of
in managing and dealing with the complexities of reading and NLP in education is also effective for mining, information
writing. This can be done with the approach of analyzing retrieval, and quality assessment.
syntactic and morphological factors. Motivation in language
learning is an effective approach, which can also be applied to C. Relationship between Language and Text
enhance the educational practices and academic performance of Results based on the secondary sources and theoretical
students [12]. Teachers can motivate students to focus and grounds reveal that text and technology have close
develop understanding of their content; it also requires utilizing relationships. Students cannot understand content without
effective strategies for the students to set the programs in proper understanding of the language. Without understanding
various domains. The linguistic tools assist in the process of of language, it is not possible for the learners to understand and
evaluating and managing the problems of lexical and syntactic retain information. The process of natural language acquisition
composition in writing and develop the understanding of is one of the most important strategies that can assist in the
content [13, 14]. Learning and developing better understanding process of language acquisition. The natural language
of language can also assist in better understanding of the acquisition process along with the motivation from teachers can
content. The theory of natural acquisition of language is also an serve as a major source for bringing improvement in the
effective approach to explain the ground for developing content academic process of the students. Teachers and learners can
knowledge acquisition. both focus on the strategies, which are efficient for utilizing
III. RESULTS AND DISCUSSIONS language in educational setting. For example, research studies
reveal that context analysis is not possible without developing
A. Innovatiove Education Applications of NLP the understanding of text.
Text Evaluator also known as Source Rater is a trending According to Reference [15] the implementation of natural
tool that represent a trending concept for modeling text language processing for education reveals that teachers and
complexity, created to help assist developers assess source educators can follow NLP approach for designing and planning
material for use in creation of new reading comprehension curriculum. This can also assists students to follow same
passages and items. This tool unites a large, cognitively based strategic approach for learning and understanding the content.
features set with sophisticated psychometric approaches in The authors of texts and content can use this approach for
order to present text complexity categorizations that are highly storing and coding the information for the students, and
associated with categorization provided by experienced students follow same approach and linguistic composition for
educators. decoding the text. Therefore, knowledge of natural language
Language Muse is another example educational application process is effective for educators, teachers, authors, and
of NLP, using a web-based application for instructional students for better learning and understanding of the content,
authoring and projected to support K12 instructors in the and improvement of writing skills. The results also suggest that
creation of English-language learners’ curriculum material. The this approach is effective for its application in the assisting
linguistic feedback given by this application highlights students to study material from web resources, especially in the
sentence structures, vocabulary, and discourse relations that e-learning.

74 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 5, No. 12, 2014

D. Application of Natural Language Processing in Education The use of grammar, syntax, and sentence composition can
There are a number of different effective approaches, which be efficiently utilized through linguistics software systems such
assist in the process of e-learning and using web based current as grammar checkers, which are saves times and provides
information related to the educational course and curriculum. assistance for both teachers and learners. Therefore, there is
E-learning applications and tools provide assistance for the need for developing effective approach for the social and
learners to improve their education. Teachers also assist their cultural perspectives. Implementation of NLP is also effective
students for enhancing their skills and knowledge for getting for using the e-learning approach in order to understand and
current information using the online resources, which assists in learn from the data available from the electronic sources. There
getting information from the online sources. NLP is also very are also future implementations of this research, which can
effective for providing knowledge and information to the assists in identifying the complex pattern in language. Further
students for application of e-learning and NLP in understanding research can be conducted to identify its impact in individual
and dealing with the need of analyzing text. Understanding of learning, understanding of context, and effectiveness of NLP in
text is based on the development of research-based analysis of writing and assessment procedure.
the general and contextual learning. REFERENCES
[1] Nadkarni, P. M., Ohno-Machado, L., & Chapman, W. W. (2011).
Based on the research outcome, it is clear that students’ Natural language processing: an introduction. Journal of the American
output can be increased by implementing the NLP in the Medical Informatics Association, 18(5), 544-551
education. NLP is very effective approach for developing the [2] Burstein, J. (2009). Opportunities for natural language processing
understanding of students in the natural settings and assessing research in education. In Computational Linguistics and Intelligent Text
the information available from the various sources. The better Processing (pp. 6-27). Springer Berlin Heidelberg.
understanding of information and ability to access the http://link.springer.com/chapter/10.1007/978-3-642-00382-0_2#page-1
information from large amount of data available on websites [3] Habash, N. Y. (2010). Introduction to Arabic natural language
and other online sources can assist in generating and gathering processing.Synthesis Lectures on Human Language Technologies, 3(1),
1-187.
information. Therefore, based on the results and effectiveness http://www.morganclaypool.com/doi/abs/10.2200/s00277ed1v01y20100
of NLP in the educational context, it is clear that NLP can be 8hlt010
effectively applied for academic writing, assessment purposes, [4] Liu, K., Hogan, W. R., & Crowley, R. S. (2011). Natural language
writing tests questions, and utilizing automatic writing systems processing methods and systems for biomedical ontology learning.
for preparation of objective tests etc. [15] Journal of biomedical informatics, 44(1), 163-179.
[5] De Vries, M. H., Monaghan, P., Knecht, S., & Zwitserlood, P. (2008).
The application of NLP in education system is also very Syntactic structure and artificial grammar learning: The learnability of
effective for analysis of errors in objective assessments and for embedded hierarchical structures. Cognition, 107(2), 763-774.
the assessment of essays. Various linguistic approaches and http://dl.acm.org/citation.cfm?id=1967546
tools can be utilized for analyzing the errors such as [6] Pascual-Nieto, I., Santos, O. C., Perez-Marin, D., & Boticario, J. G.
grammatical and stylistic errors. Teachers can easily mark (2011, July). Extending Computer Assisted Assessment Systems with
Natural Language Processing, User Modeling, and Recommendations
these errors in the papers of students. There are various Based on Human Computer Interaction and Data Mining. In IJCAI
effective grammar checkers and evaluation sources, which Proceedings-International Joint Conference on Artificial Intelligence
assist in resolving the problems of dealing with the current (Vol. 22, No. 3, p. 2519).
process of learning. Teachers can use NLP for assessment of [7] Wahl, H., Winiwarter, W., & Quirchmayr, G. (2010, November).
multiple-choice questions and analysis of grammatical pattern Natural language processing technologies for developing a language
in the text that needed to be analyzed. The application of learning environment. In Proceedings of the 12th International
Conference on Information Integration and Web-based Applications &
Standard e-learning method is very effective in order to ensure Services (pp. 381-388). ACM.
that student can efficiently apply the data in the e-learning [8] de-la-Fuente-Valentín, L., Carrasco, A., Konya, K., & Burgos, D.
system. This approach is not only affective for its application in (2013). Emerging Technologies Landscape on Education. A review.
assessment, but it is also effective for writing purposes such as International Jorunal of Interactive Multimedia and Artificial
writing material for digital libraries, websites, and various other Intelligence, 2(3), 55-70.
sources. http://dialnet.unirioja.es/descarga/articulo/4426252.pdf
[9] Felix, U. (2005). E-learning pedagogy in the third millennium: the need
IV. CONCLUSIONS for combining social and cognitive constructivist approaches. ReCALL,
17(01), 85-100.
In conclusion, Natural Language Processing and its http://journals.cambridge.org/action/displayAbstract?fromPage=online&
Educational Application provide a perfect solution to the aid=305554&fileId=S0958344005000716
various problems and barriers in the educational system, which [10] Benson, P. (2013). Teaching and researching: Autonomy in language
result in affecting the academic progress and learning of the learning. Routledge.
http://books.google.com/books?id=ZoarAgAAQBAJ&printsec=frontcov
students. Language is one of the major concerns for the er
students. NLP with an effective approach for assisting the [11] Castellani, S., Kaplan, A., Roulland, F., & Roux, C. (2010). U.S. Patent
progress and improvement in the learning ability of students No. 7,797,303. Washington, DC: U.S. Patent and Trademark Office.
based on development and implementation of various effective http://www.google.com/patents/US7797303
tools, assist writing, learning, and assessment of texts, such as [12] Shi, C., Verhagen, M., & Pustejovsky, J. (2014). A Conceptual
use of search engines, electronic resources and analysis of Framework of Online Natural Language Processing Pipeline
grammatical construction, syntax, sentence composition, etc. Application. COLING 2014, 53.
http://www.aclweb.org/anthology/W/W14/W14-52.pdf#page=63
All these are the effective techniques, which can be utilized to
develop the structural framework for analysis of texts.

75 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 5, No. 12, 2014

[13] Najeeb, M. M., Abdelkader, A. A., & Al-Zghoul, M. B. (2014). Arabic [14] Richards JC, Rodgers T (2001) Approaches and Methods in Language
Natural Language Processing Laboratory serving Islamic Sciences. Teaching. Second Edition. New York: Cambridge University Press
International Journal. [15] Van den Branden (2012) Task-based language education. In: Burns A,
http://thesai.org/Downloads/Volume5No3/Paper_16- Richards JC, The Cambridge Guide to Pedagogy and Practice in
Arabic_Natural_Language_Processing_Laboratory_serving_Islamic_Sci Language Teaching. New York: Cambridge University Press, 140–48.
ences.pdf

76 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 5, No. 12, 2014

An Upper Ontology for Benefits Management of


Cloud Computing
Richard Greenwell*, Xiaodong Liu and Kevin Chalmers
Institute for Informatics and Digital Innovation,
Edinburgh Napier University, Edinburgh, UK

Abstract—Benefits Management provides an established multiple stakeholders to add further case studies. This will
approach for decision making and value extraction for IT/IS provide additional reasoning and scenarios to the ontology.
investments and, can be used to examine cloud computing
investments. The motivation for developing an upper ontology The main contributions of this paper are to advance the
for Benefits Management is that the current Benefits Benefits Management technique through the introduction of
Management approaches do not provide a framework for semantics and to develop an ontology from a number of case
capturing and representing semantic information. There is also a studies which can be accessed and edited by other researchers.
need to capture benefits for cloud computing developments to The ontology developed for Benefits Management will allow a
provide existing and future users of cloud computing with better service to be created, which can be accessed and enhanced by
investment information for decision making. This paper multiple stakeholders.
describes the development of an upper ontology to capture
greater levels of knowledge from stakeholders and IS This paper will now continue with a discussion of the
professionals in cloud computing procurement and Benefits Management approach. The application to cloud
implementation. Complex relationships are established between computing will then be presented as a number of enablers. The
cloud computing enablers, enabling changes, business changes, process of the generation of an upper ontology will then be
benefits and investment objectives described. A number of case studies are shown encoded into
the ontology. The resulting ontology structure and conclusions
Keywords—Ontology Generation; Benefits Management; Cloud will be described along with further work.
Computing
II. BENEFITS MANAGEMENT
I. INTRODUCTION
The Benefits Management approach was developed out of
The Benefits Management approach has been developed a dissatisfaction with IS/IT projects‟ failure to deliver business
over a number of years by researchers such as Ward and value. Benefits Management is defined by Ward and Daniel
Daniel [1] and Peppard et al.[2]. The approach allows [1] as “The process of organizing and managing such that
stakeholders to gain maximum business benefit from IS/IT potential benefits arising from the use of IS/IT are actually
investments by considering the linkage between investments realized”. The approach concentrates on benefits delivery,
and the business benefits they generate. Ward and Daniel‟s obtaining value from investments and involving stakeholders.
work [1] shows high levels of dissatisfaction with the benefits There is emphasis on change management, that is, the
derived from IS/IT activities, with 81% of those surveyed importance of IS/IT investments only delivering benefits
having dissatisfaction with the evaluation and review of through organisational change.
benefits and 75% having dissatisfaction with the planning and
delivery of benefits respectively. Ward and Daniel [1] describe the need for a common
language and reference model in exploring benefits enabled by
Cloud computing is a relatively new IS/IT technology that IS/IT investments. Using an ontology driven approach,
many organisations are beginning to use and are considering multiple stakeholders can develop vocabularies, terms and
investing in. Organisations may not have considered how such semantics and map them to form a common discourse. The
investments will deliver business benefits. This paper authors also describe the importance of context, while the
examines Benefits Management in cloud computing, by using semantic modelling and mapping tools help contributors to
a number of primary and secondary case studies, to provide a model context in the Benefits Management process.
unique contribution in this area. The Benefits Management
approach is used to develop an ontology to structure the The Benefits Management approach attempts to link IS/IT
knowledge gained from the case studies. enablers such as new technology advances to create change in
the organisation. The changes are termed enabling changes.
The motivation for using an ontology is to abstract IS/IT enablers are only useful if they enable change in the
knowledge and reasoning from the ontology in order to organisation. Enabling changes trigger business changes in the
develop a number of reasoning approaches based on attributes organisation that delivers benefits. The benefits meet clearly
such as organisation size, type of cloud technology used and defined investment objectives. The Benefits Management
other factors to provide clear, precise and unambiguous process is encapsulated in the Benefits Dependency Network
definitions of the benefits derived from cloud computing. (BDN) shown in the figure below.
Future work will see the expansion of the ontology to allow

77 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 5, No. 12, 2014

[4]. IaaS is the lowest level of enabler, where users procure


Enabling Business Business hardware and operating system resources at a cost. PaaS
Enabler
Change Change Benefit brings together infrastructure, programming languages and
data storage in a single package. SaaS provides customers
Investment with the ability to rent software packages on demand.
Objective Ownership models of cloud resources have also been defined
Enabling Business Business by NIST [4] as public clouds, private clouds and hybrid

Drivers
Enabler
Change Change Benefit clouds. Public clouds are provided by third parties at an
agreed service level and price. Private clouds use cloud
technology to provide services to customers within an
organisation. Hybrid clouds use both public and private clouds
Enabling Business Business Investment to provide services to customers.
Enabler
Change Change Benefit Objective
Technologies built on cloud computing such as Big Data
[5], Data Science [6] and storage services [7] are key enablers
IS/IT Enabling Business Business Investment for generating change and benefits in cloud computing
Enablers Changes Changes Benefits Objectives investments.
Fig. 1. Benefits Dependency Network A number of enablers have been identified from primary
and secondary sources [8][9][10][11][12][13][14] which are
A holistic approach should be taken in the Benefits now discussed. Cost is a primary enabler, IaaS provides low
Management approach. Intangible benefits should not be cost of ownership and the ability to manage cost. Ease of
ignored. Previous approaches to examining the value delivered movement from test to production is facilitated by allowing a
from IS/IT investment have concentrated on financial number of virtual instances can be procured and used to move
measures and may have ignored the full spectrum of benefits. from test to production. Large scale storage with low cost of
ownership is provided by storage that can be purchased on
The classification of benefits is the next stage of the
demand and is managed and backed-up in the cloud.
approach. A business case is presented to key stakeholders
who will then make investment decisions. The more explicitly Alternative ways of working and new products are being
a benefit can be expressed the easier it is to gain commitment created by cloud computing. Shared development spaces
to investment. A benefit expressed in financial terms will between organisations especially in public clouds, can provide
more easily gain acceptance than a benefit that is merely joint developments or provide greater customer intimacy.
observable. Benefits can also be classified in terms of the next Organisations can create new products, especially on the PaaS
action for a given benefit, if something new should be done, platform. Data Science, Big Data and „Smart Cities‟ [15]
continued or stopped. become feasible for small and medium size organisations.
Flexibility of resources allow organisations can
The final stage of the Benefits Management process is to
downsize/upsize on demand.
identify benefits into the types. High potential investments
may deliver high value but carry high risk. Strategic New markets and marketing can be accessed. The
investments are central to the success of the business. Key marketing power of cloud computing allow cloud solutions to
operational investments can be improved to increase be marketing tools, with an organisation‟s status improved by
productivity in the business. Support investments deliver the having a cloud computing solution. Many large corporations
least value to the business and may be stopped if they become and government organisations require solutions to be cloud
more expensive. based, for example the United Kingdom‟s G-Cloud [16].
An issue with the current Benefits Management process is Private and public organisations can offer infrastructure
that it does not express the elements of the process (such as and services to other organisations, to reduce ownership cost
enablers, changes and benefits) and the relationships between or to generate revenue. Public organisations can create cloud
the elements of the process in terms of semantics. A key infrastructure for economic development [17].
contribution of this paper is to use an ontology to improve the
knowledge representation within the Benefits Management There are a number of operational enablers in cloud
process. New ontology tools such as Web Protégé [3] allow computing adoption. Cloud storage and infrastructure
multiple stakeholders to build a Benefits Management solutions can be used to manage disaster recovery [18].
ontology through collaborative developments. Infrastructure management tasks can be reduced, which allows
employees to concentrate on more skilled work or to develop
In the next section IS/IT enablers seen in cloud computing new skills. Cloud services can be delivered to a number of
will be considered. devices [19]. The security of the infrastructure can be
improved [20].
III. CLOUD COMPUTING ENABLERS
A number of IS/IT enabling technologies have been IV. COLLABORATIVE ONTOLOGY CREATION
identified in cloud computing. Provision models such This section examines how ontologies are created, in
Infrastructure as a Service (IaaS), Platform as a Service (PaaS) particular upper ontologies that concentrate on a specific
and Software as a Service (SaaS) have been defined by NIST subject area, such as Benefits Management. Ontologies allow

78 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 5, No. 12, 2014

the capture and formalisation of knowledge using semantics. The ontology engineering process is examined in
Knowledge is represented as terminology (via description Strohmaier et al. [29]. The researchers describe four aspects of
logics) and assertions represented in the terminology. This ontology development dynamic, social, lexical and behavioral.
provides the ability to capture greater knowledge in the
Benefits Management process. The ontology can be queried The dynamic aspects describe how ontologies change over
by languages such as SPARQL [21]. time. The researchers found that changes occurred in bursts
around the project start-up date and, during meetings between
Benefits Management brings together a number of collaborators. The social aspects of ontology development see
stakeholders in an organisation to consider how IS/IT enablers collaborators working in small groups of two or three people.
(such as cloud computing) can generate benefits. Research The vocabulary of the ontology will stabilise as it becomes
into ontologies and associated description logics has mature. This is described as the lexical aspect of the ontology
concentrated on biomedical research [22] and the novel development process and can be measured using a number of
application to new areas such as Benefits Management can be mathematical measures of texts such as word similarity or
seen to deliver great value to organisations. Vector Space Models (VSM) of corpora [30]. The behavioral
aspects of ontology development describe how collaborators
Upper ontology are designed for narrowly defined subject change the ontology over time. It was found that a change
areas. A number of upper ontologies have been developed, for hierarchy saw developers modifying a high level concept and
example the Good Relations ontology [23] in commerce and then going on to transform lower level concepts.
NASA QUDT [24] for units of measurement. In any
development of Benefits Management ontology there will be a Tudorache [3] proposes the usage of WebProtégé as a
collaborative effort to define ontology structure and content. collaborative ontology editing tool. The tool is light weight in
comparison to desktop computer based tools, such as the
Walk et al. [25] discuss collaborative ontology engineering existing Protégé [28] tool. The WebProtégé tool allows
projects. The collaborative approach is ideally suited to information to be entered via structured input forms which
creating explicit specifications and shared conceptualisations should be familiar to non-technical users, such as domain
of benefits derived from IS/IT investments from multiple
specialists. The forms can be tailored to a number of user
stakeholders. Stakeholders can collaborate using tools such as groups. There is support for collaborative working such as
WebProtégé [3] to work on the structure of the ontology (the threaded discussions, change notifications and change
terminology or T-BOX and the relational aspects of ontology statistics notice boards.
or R-BOX) and the individual instances of the ontology (the
assertions or A-BOX). Such tools allow auditing, change V. CLOUD COMPUTING CASE STUDIES
history and correctness of the ontology to be maintained. The
process of ontology generation is more difficult than off the A number of case studies were developed from primary
shelf collaborative tools that allow Wikis or shared documents and secondary sources. The case studies deal with different
be created, as technical help may be required to build a aspects of cloud computing, as described in the table below.
formally correct ontology. The creation of an upper ontology
TABLE I. ORGANISATIONS REVIEWED
for Benefits Management should provide a template in the
form of a complete or semi-complete Terminology Box (T- Organisation Type Description
BOX) for stakeholders to use. Organisation A [8] Micro Start-up Provides solutions to the
Sebastian et al. [26] describe an approach to collaborative Company music promotion industry
using PaaS/ public clouds.
ontology development using workflows. The researchers Organisation B [8] Actuarial Services Supplier of economic
highlight the need to define formal workflows for non- Consultancy modelling reports using
ontology experts such as domain experts in the areas of IaaS/PaaS on
medicine and gene research. This could be extended to public/private clouds.
business analysts or those working in the area of Benefits Organisation C [8] Public Sector Division IaaS and SaaS solutions
Management. The paper outlines a series of tasks that form a of Large Software via private clouds
Company
workflow for ontology generation, supported by an ontology Organisation D [8] Public Sector Shared service between
that describes the process for creating an ontology. This Managed Services two local authorities using
allows those who are unfamiliar with the process of ontology organisation IaaS/SaaS.
generation to create an ontology from scratch using a
collaborative method. Organisation E (new A large local authority Adoption of IaaS in a large
The importance of the change process in ontologies is the primary research) local authority with a
commercial partner.
subject of the paper by Wang et al. [27]. In large scale Organisation F [9] Oil and gas company Migration from an in house
ontology projects the ability to use and review a change migration to IaaS data centre.
process is part of the ontology building process. Ontology Organisation G [10] Media Group Software as a service for
tools such as Protégé [28] and WebProtégé [3] include a distributed media workers.
change log. The change process is a key factor when a number Organisation H [11] Quality of service for Study of factors effecting
of collaborators are working on a shared ontology. three cloud services ranking of quality of
service in IaaS.
Organisation I [12] University‟s adoption Cloud adoption in an
of cloud technology educational context

79 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 5, No. 12, 2014

Organisation J [13] Security benefits in Identification of security maintenance costs. Organisations with existing infrastructure
cloud computing benefits in cloud or those who require high levels of fixed resources can sell
computing.
excess capacity.
Organisation K [14] Implementation of The benefits of using cloud The freeing up of staff from repetitive and tedious
cloud computing by computing to enable infrastructure development and maintenance is one of the
doctors in South benefits such as better main benefits. The dis-benefit of redundancies from
Africa communication.
outsourcing to the cloud is acknowledged. Organisations D &
The methodology used in this paper was to extract E in the case studies are large local authorities which have
Benefits Management information from each case study which successfully adopted cloud infrastructure and redeployed staff
was then used to build an upper ontology for Benefits into new customer facing roles.
Management. The table below shows the IS/IT enablers for Public authorities, academic institutions and non-profit
each case study. organisations can use cloud infrastructure to allow start-up
organisations to develop. Organisation E has used this
TABLE II. ENABLERS CROSS-REFERENCED TO CASE STUDIES approach to generate economic development The BDN for
Enabler Case Study business enablers is shown below.
Cost A-K Utility
Lower
Approach to Lower Compete
Ease of movement from test A, B Cost Input
Procurement Cost on Cost
to production Costs
of Services
Large scale storage with low A, B, G
cost of ownership
Shared development space A, B New Move
Move to
Products Larger from
New Product
New products A, B, E Created by Based
Customer Projects
Products Cloud Base to
Development
Technology Products
Flexibility of resources A, B, C, H, I

Marketing power of cloud B, C


computing
New markets and C, D, E Cloud as a Market as a Invest in
procurement models Marketing Cloud
Marketing Marketing Cloud
Advantage Driven
Provide services to 3rd C, D, E Tool Organisation
Marketing
parties
Create infrastructure for E
start-up companies
Disaster recovery E Reduction
in Wiliness to Greater Invest in
New
Barriers Enter New Access to New
Device independence and G, I, K Markets
to New Markets Markets Markets
geographical distribution Market
Improve employee F, H
satisfaction
Improved security E, J
Excess Organisations
Lower Cost
Services to Capacity Can Become Sell Unused
The table above shows that many of the enablers were 3rd Parties Can be Infrastructure
of
Infrastructure
Ownership
present in the organisations covered by the case studies. The Sold Sellers
enablers can be split into two groups, business and
operational.
Cost was an enabler in all organisations. The usage of IaaS Concentrate
was seen as enabler to reduce costs in the short-term and a Outsourcing
on Non- Greater Investment in
Employee of
routine Staff Cloud
major reason for the uptake of cloud computing. Repeated cost Satisfaction Infrastructure
Added Value Motivation Infrastructure
Management
reduction may not be feasible in the long-term and other Activities
enablers should be examined.
There are a number of new products being created by
cloud computing such as storage solutions, data science
Generate
applications and development environments. These enable Ability to Use IaaS as a
Economic Develop IaaS
Start-up Distribute Economic
organisations to gain new customers and to enter new markets. Infrastructure Infrastructure Development
Development for Start-up
Through Organisations
Tool
Infrastructure
Marketing cloud service enablers allow organisations to
attract new customers and to maintain existing customers who
may move to cloud based solutions in the future. IS/IT Enabling Business Business Investment
Enablers Changes Changes Benefits Objectives
Lower costs of market entry are afforded by cloud
computing which utilises rental of resources. Organisations
can enter new markets without capital expenditure and Fig. 2. for Business Enablers

80 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 5, No. 12, 2014

The operational enablers are now described. The large-


scale storage offered by cloud computing is a major
operational enabler [31]. The backup, replication and disaster Automated Storage
Reduction in
Outsource
Management
recovery of large amounts of data can be outsourced at a very Storage Low cost Operations
of storage
Storage to the
Storage Outsourced Cloud
low cost. Many organisations described in the case studies Operations
have large amounts of critical business data which is being
moved into the cloud [32]. When low cost storage is combined
with fast Internet connections an enabling cloud technology is
created. Spot and
on- Resources Better Match Base Load and
Resource
of Demand to on-Demand
Resource procurement of hardware and software was Flexibility Demand on Demand
Resources Management
Instances
previously a capital investment decision, requiring long-term
planning, without the ability to adjust resources quickly as
business needs change. The advent of cloud computing has
seen the ability to purchase resources on-demand, through
spot instances as well as through fixed resources to cope with Cloud A Greater
Shared Easier Shared
Allows Joint Number of
base demand. Development
Control of
Sharing of Control and
Joint
Space Resources Risk
Infrastructure Developments
New approaches to the development of software solutions
have been established using hybrid and public cloud
technology. Organisation A has established a joint

Drivers
development environment with customers with a public cloud
Less Time
based platform. This has produced an operational approach Movement Cloud
Spent More Efficient Cloud
From Test Based
that is more intimate with the customer and reduces to Virtual
Setting-up Maintenance of Virtual
Test Environments Machines
operational risk though shared developments and cost. Production Machines
Machines
Public and hybrid clouds enable organisations to create
and store virtual machines at a low cost. Separate physical
hardware and software is no longer required. Virtual machines
Automated Less Cloud
can be moved from test to development more easily. Disaster Backup
Outsourcing
Management Based
of Disaster
Recovery and of Disaster Disaster
The provision of disaster recovery is an emerging market Mirroring
Recovery
Recovery Recovery
for cloud computing providers. Organisations will effectively
outsource their disaster recovery operations to the cloud
provider. This is advantageous because cloud storage is
replicated and backed up multiple times across a number of Device
Service Reduction of Cloud Based
geographical locations [33]. Virtual machines can be made Independence Cloud
Based Hardware and Infrastructure,
ready to provide instant services if a company‟s own data and Service
Cloud Software Platforms and
Geographical Provision
centre is unavailable. Expertise can be concentrated at cloud Distribution
Delivery Management Services
providers that would be difficult to replicate outside large
IS/IT providers.
Services can be accessed from a number of devices such as
Security Move to
phone apps, tablets and desktop machines more easily using Improved
Cloud
Managed Enforcement Cloud
Based
cloud based services [19]. The operational requirement to Security
Security
in the of Security Security
Cloud Models
install and manage software and data falls on the cloud
provider.
The high availability of data and secure access can be IS/IT Enabling Business Business Investment
Enablers Changes Changes
managed by the cloud provider. Systems and expertise will be Benefits Objectives
more advanced than that afforded by small in-house providers.
Fig. 3. Operational Enablers
However, there are problems with outsourcing security due to
loss of control of the organisation and conflict of interests if The benefits are classified in the table below. The financial
the cloud provider provides services to competitors. benefits are centered on the lower cost of ownership from
The operational enablers for cloud computing are shown in using utility infrastructure. New markets (such as government
the figure below. provision platforms) could be entered which would provide
financial benefits.

81 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 5, No. 12, 2014

Operational efficiencies provide further financial benefits Private clouds are being developed and there is some
such as the ability to create new environments and to development of hybrid technologies which utilise
outsource the management of computing resources. Reduced combinations of private and public cloud ownership.
fixed costs will result from the move to a „rental‟ model as
opposed to spending money on internal IS/IT infrastructure. High potential investments are riskier investments that
may yield higher returns. Small innovative organisations may
Quantifiable benefits include improvements in service use PaaS to deliver unique products that will differentiate
quality, with the ability of users to vary the amount of them from the mass market. Data science investments enabled
resources they use. The speed of functionality delivery and the by cloud computing promise high growth, but may be high
availability of resources were improved. There may be risk due to the immaturity of the technology in this area.
internal staff reductions due to cloud computing infrastructure
investments. The operational benefits of the cloud based IS/IT Key operational investments will be supported in the short-
such as the lowering of e-mail traffic and increased security in to medium-term. Private clouds will be developed by
the cloud are measureable. Future benefits from new organisations at high cost to organisations, based on in-house
technologies seen in PaaS and enabling data science servers or on customers‟ hardware. Non-cloud and grid
innovations can be measured using forecasting techniques. computing solutions will be supported in the short-term but
will be replaced by cloud technologies due to cost and
The marketing benefits of cloud computing are important usability issues. Public clouds will be important in the short-
to many of the organisations. These benefits are difficult to term for many organisations; however, their ubiquity and low
measure in the short-term but are observable in internal and cost will not generate competitive advantage in the long-term.
external marketing positions in the organisations.
TABLE IV. CLOUD INVESTMENT PORTFOLIO
TABLE III. CLASSIFICATION OF BENEFITS
Strategic High Potential
Degree of Do New Things Do Things Better Stop Doing IaaS PaaS
Explicitness Things SaaS Marketing of Cloud Computing
Financial Lower cost of Reducing time to Managing own Cloud storage Market places such as G-Cloud
ownership create infrastructure infrastructure. Private/hybrid clouds Big Data & Data Science
Reduced fixed Grid computing Shared Services
costs. Existing customers with their Non-cloud Based Software
own hardware Grid Computing
Quantifiabl Improved quality Faster turnaround of Internal Clustered in-house servers with Public Cloud (long-term)
e of service new functionality infrastructure cloud extensions
Customer self Speed of delivery Direct Private Cloud
service Availability employment of Key Operation Support
improvements staff through
infrastructure VI. AN ONTOLOGY FOR BENEFITS MANAGEMENT IN CLOUD
outsourcing
COMPUTING
Measurable Lower e-mail PaaS innovations E-mail traffic
traffic Security of data. Storing An ontology was generated from case studies previously
New markets for Improve customer information on described. This provides a formal description of the Benefits
Big Data and Data satisfaction individual Management terminology, relationships and assertions
Science computers
Observable Better customer Actively market to Waiting for
provided by the case studies.
intimacy. customers customers to The ontology was created so that the terminology can be
Improved „come to the
marketing organisation‟ reused across a number of projects. The terminology for the
Move from project Benefits Management ontology has been uploaded to the
to product based WebProtégé website [34]. This allows the full ontology to be
solutions viewed, critiqued and used by other researchers. The
Sell infrastructure assertions for the case studies described in this paper are held
and services
outside the
in a separate ontology file that can be supplied or uploaded on
organisation request. Also, the assertions can be overlaid on the
terminology to provide a full ontology. The decision to
The cloud investment portfolio is shown below. The separate the terminology and assertions was to allow for the
portfolio shows long-term strategic investments for reuse of the terminology.
organisations adopting cloud technologies such as A. Ontology Classes
infrastructure, services and storage.

82 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 5, No. 12, 2014

Degree_of_explicit
Application Action ness (financial,
(High_potential,
name quantifiable,
Key_operation,
measurable,
strategic, support) description observable)
name has_application
name
description has_action description

Enabler Enabling specifies


Change
triggers_enabling_change Benefit
name name

description description name

triggers_business_change description
creates_benefit
Business
Change is_met_by
has_owner has_owner name
Objective
description
has_owner name
description

Group\Person
(Stakeholder,
Customer, User )
name
description

Fig. 4. Overview of Main Ontology Classes

The figure above shows the main ontology classes objectives and be classified or related to investment portfolio
(entities). A full description can be found in the WebProtégé applications.
project [34]. The Benefits Dependency Network forms the
core of the ontology with semantic linkages between enablers, The names entities shown in parenthesis are child entities.
changes and benefits. Each entity can be related to the owner An expanded example for „Degree_of_explicitness‟ is shown
such as a group or stakeholder. Benefits can be linked to below.
Degree_of_explicitness
name
child description child

child child

Financial Quantifiable Measureable Observable


name name
name name description
description
description description
Observation
amount amount amount
currency
unit_of_measure unit_of_measure

Fig. 5. Degree of Explicitness

B. Ontology Implementation Cloud computing resources are treated as utilities which can
be supplied by a large number of suppliers. This gives the
An example of the ontology class implementation benefit of lower cost to the business which, meets the business
(assertions) is shown below. The cost enabler sees lower input objective of competing on cost. This is a new strategic
costs in the business. The organisation purchases on price. investment which can be expressed financially.

83 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 5, No. 12, 2014

New Financial

name: name:saving
Enabler New_Iaas
Strategic description: cost
saving
name: Cost name: strategic_cost description:
description : adopt Iaas
description: amount: 30,000
Reduced cost strategic cost
by using IaaS savings in short has_application currency:UKP
term
triggers_enabling_change has_action
Enabling Change

specifies
name:
low_input_costs
description: buy Business Change
triggers_business_change Benefit
resources as
has_owner
utility
name:
name:
utility_appraoch creates_benefit
lower_cost
description: Buy
on price
has_owner description:
reduced cost
has_owner

Stakeholder has_owner is_met_by


Objective
name:
Infrastructure name:
Manager compete_on_cost
description: description:
Owner of Cloud Allow
Infrastructure organisation to
compete on cost

Fig. 6. Ontology Implementation

C. SPARQL Queries
SPARQL [21] can be used to provide Benefits The „Benefits Stream‟ query traverses the Benefits
Management outputs from the ontology. The namespace Dependency Network (BDN) to describe the linkage between
prefix „bm‟ signifies „benefits management‟. SPARQL enablers, change, benefits and objectives. The „Stakeholder
traverses the semantic data held in the ontology to produce Analysis‟ query examines the relationships between benefits
outputs. and their owners and the stakeholders‟ commitment to the
benefits. The „Dimensions of Competence‟ query examines
Three examples of useful outputs from the Benefits the relationship between drivers in the business such as the
Management approach identified in the literature which are need to reduce costs in the business and the ability to meet the
represented as SPARQL queries are shown in the table below. drivers from competences within the business.

84 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 5, No. 12, 2014

TABLE V. SPARQL QUERIES FOR BENEFITS MANAGEMENT REPORTS language. An example can be seen in the BDN where “Cloud
Computing Enablers that create change for financial benefits
Description SPARQL Query for strategic investments” can be found. The reasoning
SELECT ?enabler ?enablingchange mechanism is more powerful and flexible than that found in
Benefits Stream [1] p. 102 - ?businesschange ?benefit ?objective technologies such as relation databases and, the terminology,
A set of related benefits and ?action ?degree_of_explicitness WHERE
their associated business and
relationships and assertions can be changed in light of new
enabling changes and
{ knowledge. Knowledge can be „created‟ by concepts such as
?enabler bm:triggers_enabling_change
enabling IS/IT ?enablingchange.
multiple-inheritance of knowledge derived through reasoning.
?enablingchange The use of collaborative ontology tools such as
bm:triggers_business_change
?businesschange. WebProtégé is ideally suited to Benefits Management
?businesschange bm:creates_benefit development. Stakeholders collaborate to define and edit
?benefit. terminology and assertions. The collaborative tools provide
?objective bm:is_met_by ?benefit. change notification and auditing required in a multi-
?benefit bm:has_action ?action. stakeholder environment.
?benefit bm:specifies
?degree_of_explicitness B. Importance of Benefits Management for Cloud Computing
}
SELECT ?owner ?benefit ?change There has been heavy investment in cloud computing,
Stakeholder analysis [1] p. ?commitment ?commitment_action which is set to increase over the next decade. It is important to
179 – Stakeholder groups, WHERE consider the benefits cloud computing will bring to
their benefits, changes and {
commitments to change organisations. This paper has laid the foundations for
?benefit bm:has_owner ?owner. considering what the likely benefits are and has structured
?benefit bm:needs_change ?change.
?change bm:has_commitment them into an appropriate knowledge representation.
?commitment.
?commitment VIII. FUTURE WORK
bm:has_commitment_action
?commitment_action A. Expansion of Web Protégé Ontology
} The terminology of the Benefits Management ontology
SELECT ?driver ?competence ?type can be improved by an internal review by the authors and a
Dimensions of competence ?description WHERE
[1] p. 114 – The different {
peer review of the WebProtégé project, which is designed to
capabilities of the ?driver bm:has_competence provide a collaborative approach to ontology development.
organisation (this will get ?competence.
competency type and ?competence a ?class.
A number of case studies have been analysed, however,
description of competency) ?class rdfs:label ?type. further work is underway to add additional assertions to the
?competence bm:description ontology through the analysis of further case studies.
?description
} B. Usage of Ontology by Organisations
WebProtégé is designed for domain experts and non-
VII. CONCLUSIONS technical knowledge engineers. Further work will involve the
This paper has described the development of an ontology definition of input forms for the entry benefits information. A
for Benefits Management in cloud computing. This has number of client interfaces are being developed to provide
delivered two unique contributions. Firstly, the Benefits rich user interfaces for non-expert users for the T-BOX.
Management approach has been enhanced to include semantic
The T-BOX for the ontology described in this paper can be
constructs and a formal knowledge description. This is
downloaded from WebProtégé [34] and the R-BOX and A-
enhanced using by WebProtégé, which allows collaborative
BOX are available on request. Organisations can use the T-
development.
BOX to develop their own benefits managements ontology (by
Secondly, an ontology has been developed and populated defining an R-BOX and A-BOX) using WebProtégé or a user
with case studies that can be used as a „service‟ for those interface (which are under development).
interested in Benefits Management for cloud computing.
Toolsets to allow easier Benefits Management ontology
A. Advantages of Ontology for Benefits Management creation are required. Matching is a key approach which can
There are a number of advantages in developing an be used to match enablers, to changes and the benefits
ontology for Benefits Management in cloud computing. The generated from stakeholder groups. Mao et al. [35] describe a
usage of semantic modelling techniques improves the mapping approach based on Vector Space Models (VSM), that
expressive quality of the tools found within Benefits allow textual descriptions of ontology elements to be mapped.
Management. An example could be the Benefits Dependency REFERENCES
Network (BDN) which has linkages between enablers, change [1] J. Ward and E. Daniel, Benefits Management: How to Increase the
and benefits that are more expressive than using a simple Business Value of Your IT Projects. Wiley, 2012.
network pattern. [2] J. Peppard, J. Ward, and E. Daniel, “Managing the realization of
business benefits from IT investments,” MIS Quarterly Executive, vol.
Description logics allow reasoning to take place across the 6, no. 1, pp. 1–11, 2007.
ontology. This has been demonstrated using the SPARQL

85 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 5, No. 12, 2014

[3] T. Tudorache, C. Nyulas, N. F. Noy, and M. A. Musen, “WebProtégé: A [20] D. S. Marcon, R. R. Oliveira, M. C. Neves, L. S. Buriol, L. P. Gaspary,
collaborative ontology editor and knowledge acquisition tool for the and M. P. Barcellos, “Trust-based Grouping for Cloud Datacenters:
web,” Semantic web, vol. 4, no. 1, pp. 89–99, 2013. improving security in shared infrastructures,” presented at the IFIP
[4] P. Mell and T. Grance, “The NIST definition of cloud computing,” Networking Conference, 2013, 2013, pp. 1–9.
2011. [21] O. Hartig, “SPARQL for a web of linked data: semantics and
[5] D. Agrawal, “Towards the End-to-End Design for Big Data computability,” in The Semantic Web: Research and Applications,
Management in the Cloud: Why, How, and When?,” presented at the Springer, 2012, pp. 8–23.
BTW, 2013, pp. 15–16. [22] D. Allemang and J. Hendler, Semantic Web for the Working Ontologist:
[6] I. T. Foster and R. K. Madduri, “Science as a service: how on-demand Effective Modeling in RDFS and OWL. Elsevier Science, 2011.
computing can accelerate discovery,” presented at the Proceedings of the [23] M. Hepp, “Goodrelations: An ontology for describing products and
4th ACM workshop on Scientific cloud computing, 2013, pp. 1–2. services offers on the web,” in Knowledge Engineering: Practice and
[7] C. Wang, K. Ren, W. Lou, and J. Li, “Toward publicly auditable secure Patterns, Springer, 2008, pp. 329–346.
cloud data storage services,” Network, IEEE, vol. 24, no. 4, pp. 19–24, [24] H. Rijgersberg, M. van Assem, and J. Top, “Ontology of units of
2010. measure and related concepts,” Semantic Web, vol. 4, no. 1, pp. 3–13,
[8] R. Greenwell, X. Liu, and K. Chalmers, “Benefits Management of Cloud 2013.
Computing Investments,” International Journal of Advanced Computer [25] S. Walk, P. Singer, M. Strohmaier, D. Helic, N. Noy, and M. Musen,
Science and Applications (IJACSA), vol. 5, no. 7, 2014. “Sequential Usage Patterns in Collaborative Ontology-Engineering
[9] A. Khajeh-Hosseini, D. Greenwood, and I. Sommerville, “Cloud Projects,” arXiv preprint arXiv: 1403.1070, 2014.
migration: A case study of migrating an enterprise it system to iaas,” [26] A. Sebastian, N. F. Noy, T. Tudorache, and M. A. Musen, “A Generic
presented at the Cloud Computing (CLOUD), 2010 IEEE 3rd Ontology for Collaborative Ontology-Development Workflows,” in
International Conference on, 2010, pp. 450–457. Proceedings of the 16th international conference on Knowledge
[10] N. Convery and K. Ferguson-Boucher, “Storing information in the Engineering: Practice and Patterns, Berlin, Heidelberg, 2008, pp. 318–
cloud,” Bulletin of Information and Records Management Society, 328.
Issue, pp. 3–5, 2010. [27] H. Wang, T. Tudorache, D. Dou, N. F. Noy, and M. A. Musen,
[11] S. K. Garg, S. Versteeg, and R. Buyya, “A framework for ranking of “Analysis of User Editing Patterns in Ontology Development Projects,”
cloud computing services,” Future Generation Computer Systems, vol. presented at the On the Move to Meaningful Internet Systems: OTM
29, no. 4, pp. 1012–1023, 2013. 2013 Conferences, 2013, pp. 470–487.
[12] V. Chang and G. Wills, “A University of Greenwich Case Study of [28] Protege, “The Protégé Ontology Editor and Knowledge Acquisition
Cloud Computing,” E-Logistics and E-Supply Chain Management: System,” 2014. [Online]. Available: http://protege.stanford.edu/.
Applications for Evolving Business, p. 232, 2013. [Accessed: 24-Nov-2014].
[13] M. Shi, “Capturing strategic competences: cloud security as a case [29] M. Strohmaier, S. Walk, J. Pöschko, D. Lamprecht, T. Tudorache, C.
study,” Journal of Business Strategy, vol. 34, no. 3, pp. 41–48, 2013. Nyulas, M. A. Musen, and N. F. Noy, “How ontologies are made:
Studying the hidden social dynamics behind collaborative ontology
[14] A. Coleman, M. E. Herselman, and M. Coleman, “Improving Computer- engineering projects,” Web Semantics: Science, Services and Agents on
mediated Synchronous Communication of Doctors in Rural the World Wide Web, vol. 20, pp. 18–34, 2013.
Communities through Cloud Computing A Case study of Rural
Hospitals in South Africa.,” Oriental Anthropologists, vol. 13, no. 1, [30] C. Manning, P. Raghavan, and Schutze, Introduction to Information
2013. Retrieval. Cambridge University Press, 2008.
[15] C. Perera, A. Zaslavsky, P. Christen, and D. Georgakopoulos, “Sensing [31] A. N. Toosi, R. N. Calheiros, and R. Buyya, “Interconnected Cloud
as a service model for smart cities supported by internet of things,” Computing Environments: Challenges, Taxonomy, and Survey,” ACM
Transactions on Emerging Telecommunications Technologies, vol. 25, Computing Surveys (CSUR), vol. 47, no. 1, p. 7, 2014.
no. 1, pp. 81–93, 2014. [32] Y. Han, “On the clouds: a new way of computing,” Information
[16] G-Cloud, “G-Cloud - Home of the UK Government G-Cloud Technology and Libraries, vol. 29, no. 2, pp. 87–92, 2013.
Programme G-Cloud,” 2013. [Online]. Available: [33] Z. Wu, M. Butkiewicz, D. Perkins, E. Katz-Bassett, and H. V.
http://gcloud.civilservice.gov.uk/. [Accessed: 24-Nov-2014]. Madhyastha, “Spanstore: Cost-effective geo-replicated storage spanning
[17] J. Bughin, M. Chui, and J. Manyika, “Clouds, big data, and smart assets: multiple cloud services,” presented at the Proceedings of the Twenty-
Ten tech-enabled business trends to watch,” McKinsey Quarterly, vol. Fourth ACM Symposium on Operating Systems Principles, 2013, pp.
56, 2010. 292–308.
[18] O. H. Alhazmi and Y. K. Malaiya, “Evaluating disaster recovery plans [34] R. Greenwell, “Benefits Management Web-protege project,” 2014.
using the cloud,” presented at the Reliability and Maintainability [Online]. Available:
Symposium (RAMS), 2013 Proceedings-Annual, 2013, pp. 1–6. http://webprotege.stanford.edu/#Edit:projectId=d9a84e6c-9448-4921-
b38c-e5e4f0e0ef2c.
[19] H. T. Dinh, C. Lee, D. Niyato, and P. Wang, “A survey of mobile cloud
computing: architecture, applications, and approaches,” Wireless [35] M. Mao, Y. Peng, and M. Spring, “An adaptive ontology mapping
communications and mobile computing, vol. 13, no. 18, pp. 1587–1611, approach with neural network based constraint satisfaction,” Web
2013. Semantics: Science, Services and Agents on the World Wide Web, vol.
8, no. 1, pp. 14–25, 2010.

86 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 5, No. 12, 2014

Determining the Efficient Structure of Feed-Forward


Neural Network to Classify Breast Cancer Dataset
Ahmed Khalid Noureldien A. Noureldien
Najran University, Community College University of Science and Technology
Department of Computer Science Department of Computer Science
Najran, KSA Omdurman, Sudan

Abstract—Classification is one of the most frequently node in a layer has one corresponding node in the next layer,
encountered problems in data mining. A classification problem thus creating the stacking effect [1].
occurs when an object needs to be assigned in predefined classes
based on a number of observed attributes related to that object. The hidden layer is a collection of neurons which provide
an intermediate layer between the input layer and the output
Neural networks have emerged as one of the tools that can layer. Activation functions are typically applied to hidden
handle the classification problem. Feed-forward Neural Networks layers.
(FNN's) have been widely applied in many different fields as a
classification tool.
Neural Networks are biologically inspired and mimic the
human brain. A neural network consists of neurons which are
Designing an efficient FNN structure with optimum number interconnected with connecting links, where each link have a
of hidden layers and minimum number of layer's neurons, given weight that multiplied by the signal transmitted in the network.
a specific application or dataset, is an open research problem. The output of each neuron is determined by using an
activation function such as sigmoid and step. Usually
In this paper, experimental work is carried out to determine nonlinear activation functions are used. Neural networks are
an efficient FNN structure, that is, a structure with the minimum trained by experience. When an unknown input to the network
number of hidden layer's neurons for classifying the Wisconsin is applied, it can generalize from past experiences and product
Breast Cancer Dataset. We achieve this by measuring the a new result [2], [3].
classification performance using the Mean Square Error (MSE)
and controlling the number of hidden layers, and the number of Feed-forward neural networks (FNN) are one of the
neurons in each layer. popular structures among artificial neural networks. These
efficient networks are widely used to solve complex problems
The experimental results show that the number of hidden by modeling complex input-output relationships [4], [5].
layers has a significant effect on the classification performance
and the best classification performance average is attained when Neural networks have been widely used for breast cancer
the number of layers is 5, and number of hidden layer's neurons diagnosis [6] [7] [8], and Feed-forward Neural Network
are small, typically 1 or 2. (FNN) is commonly used for classification. Many researches
evaluates the effect of the number of neurons in the hidden
Keywords—Hidden Layers; Number of neurons; Feed Forward layer [9] [10][11] [12].
Neural Network; Breast Cancer; Classification Performance
In this paper an experimental investigation was conducted
I. INTRODUCTION to see the effect of the number of neurons and hidden layers of
feed forward neural network on classification performance for
Classification is one of the most frequently encountered the breast cancer dataset. The work of this paper will be
problems in decision making tasks. A classification problem presented in different sections. In the second section materials
occurs when an object needs to be assigned in predefined and methods are introduced. An experiment and results are
classes based on a number of observed attributes related to that presented in the third section. Section four gives discussion
object. Many problems in business, science, industry, and and conclusions.
medicine can be treated as classification problems.
Neural networks have emerged as one of the tools that can II. MATERIALS AND METHODS
handle the classification problem. The advantage of neural The performance analysis of FFNN is to estimate the
networks is that, neural networks are data driven self-adaptive training and generalization errors. The result with the
methods so that they can adjust themselves to the data without minimum estimated generalization error is used to determine
any explicit specification of functional form for the underlying an optimum for the application of neural network model [13].
model, and they can approximate any function with arbitrary
The feed forward neural network is built using of
accuracy.
Levenberg-Marquardt training algorithm which is widely used
Artificial neural networks consist of an input layer of in classification literature [14,15,16].The network architecture
nodes, one or more hidden layers and an output layer. Each used is composed of nine neurons for input layer and one
neuron for the output layer. To achieve the paper objectives,

87 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 5, No. 12, 2014

the number of hidden layers and the number of neurons per From Table (1), the numbers of neurons that achieve best
hidden layer are changed during the training and simulation of performance in terms of MSE mean are 1, 9, 16, 17, 11, 3 and
the network. The learning rate of 0.5 was used. The number of 19.
maximum allows epochs were 1000.The activation functions
used in the different layers of the neural network is logsig. The In the second experiment, the network was trained with
performance of the classification is measured by the mean two hidden layers. In the first layer the numbers of neurons
square error (MSE) which is calculated by equation 1. that achieve the best performance in the first experiment are
used. For the second hidden layer the number of neurons is
∑ = ∑ varied from 1 to 10. The network was trained 30 times for
each possible pair of neurons. Table2shows the pair of number
Where t(k) is the target output and a(k) is network output. of neurons in the two layers that achieve the minimum MSE
mean.
In this paper Wisconsin Breast Cancer Data (WBCD) is
used, which have been analyzed by various researchers of TABLE II. THE MINIMUM MEAN OF MSE IIN THE 2ND EXPERIMENT
medical diagnosis of breast cancer in the neural network
literature [5], [16], [17], [18]. This data set contains 699 No. of Neurons in
the 2nd hidden No. of Neurons in
instances. The first attribute is the ID of an instance, and the Mean of MSE layer first hidden layer
next 9 attributes (Clump Thickness, Uniformity of Cell Size,
Uniformity of Cell Shape, Marginal Adhesion, Single 0.0894 1 17
Epithelial Cell Size, Bare Nuclei, Bland Chromatin, Normal
0.0952 1 3
Nucleoli and Mitoses) represent different characteristics of an
instance and the last attribute takes the values 2 and 4 (2 for 0.0943 1 11
benign, 4 for malignant). Each instance has one of 2 possible
classes (benign or malignant). 0.0992 2 19

In our experiments all 9 attributes are used. Each attribute 0.1020 2 16


has the domain 1-10. The data set was partitioned into two sets
0.1030 5 9
training and testing set. The testing set was not seen by neural
network during the training phase. It is only used for testing 0.1435 2 1
the neural network after training.
Table (2), the pairs of numbers of neurons that achieve
In our classification experiments all 9 attributes are used, best performance in terms of MSE mean are (17, 1) (11, 1), (3,
and 80% of the data is used for training, and the remaining 1), (19, 2), (16, 2), (9, 5), (1, 2).
20% is used for testing.
III. EXPERIMENTS AND RESULTSA In the third experiment the network is trained with three
hidden layers. The number of neurons in the first and second
Five experiments are carried out. In the first experiment
hidden layer is those pairs which give better performance in
the network is trained with one hidden layer; the number of
experiment two as listed above. The number of neurons in the
neurons in this hidden layer is varies from 1 to 20. The
third hidden layer is varied from 1 to 10. The network was
network was trained 30 times for each structure.
trained 30 times for each possible triple set of neurons. Table
Table (1) shows the ordered minimum mean of MSE for (3) shows the number of neurons in the three layers that
the 30 training trails for each number of neurons. achieve the minimum MSE mean.

TABLE I. THE MINIMUM MEAN OF MSE IN THE 1ST EXPERIMENT


From Table (3), the three numbers of neuron numbers in
the three layers that achieve best performance in terms of MSE
Number of neurons MSE Mean of 30 trails mean are (1, 2, 1), (3, 1, 1), (9, 5, 2), (19, 2,4), (17, 1, 1),(16,
1 0.0805 2, 5), and (11, 1, 1) sequentially.
9 0.0857 In a similar manner, in experiments 4 and 5 the network is
16 0.0858 trained using four layers and five layers respectively. Tables
17 0.0945 (4) and (5) show the numbers of neurons in different layers
11 0.0958 that achieve the minimum MSE mean.
3 0.0983
19 0.0985

88 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 5, No. 12, 2014

TABLE III. THE MINIMUM MEAN OF MSE IN THE 3RD EXPERIMENT

Mean of MSE No. of Neurons in 3rd Mean of No. of Neurons in 2nd Mean of MSE No. of Neurons in 1st
Hidden Layer MSE Hidden Layer Hidden Layer
0.1435 0.0805
0.0810 1 2 1
0.0951 0.0983
0.1060 1 1 3
0.103 0.0857
0.1115 2 5 9
0.0992 0.0985
0.1188 4 2 19
0.0894 0.0945
0.1344 1 1 17
0.102 0.0858
0.1366 5 2 16
0.0943 0.0958
0.1835 1 1 11

TABLE IV. THE MINIMUM MEAN OF MSE IN THE 4TH EXPERIMENT

Mean of No. of Mean of No. of Mean of No. of Neurons in Mean of No. of Neurons in
MSE Neurons in MSE Neurons in 3rd MSE 2nd Hidden Layer MSE 1st Hidden Layer
4th Hidden Hidden Layer
Layer r
0.0992 0.0985
0.0310 2 0.1188 4 2 19
0.1435 0.0805
0.0303 1 0.081 1 2 1
0.0951 0.0983
0.0468 1 0.106 1 1 3
0.102 0.0858
0.0606 2 0.1366 5 2 16
0.0943 0.0958
0.0660 2 0.1835 1 1 11
0.103 0.0857
0.1183 4 0.1115 2 5 9
0.0894 0.0945
0.1228 1 0.1344 1 1 17

TABLE V. THE MINIMUM MEAN OF MSE IN THE 5TH EXPERIMENT

Mean of No. of Mean of No. of Mean of No. of Mean of No. of Mean of No. of
MSE Neurons in MSE Neurons in MSE Neurons in MSE Neurons in MSE Neurons in
5th Hidden 4th Hidden 3rd Hidden 2nd Hidden 1st Hidden
Layer Layer Layer Layer Layer r
0.1020 0.0858
0.0300 2 0.0606 2 0.1366 5 2 16
0.1435 0.0805
0.0303 1 0.0303 1 0.081 1 2 1
0.0951 0.0983
0.0323 4 0.0468 1 0.106 1 1 3
0.1030 0.0857
0.0336 1 0.1183 4 0.1115 2 5 9
0.0992 0.0985
0.0821 2 0.0310 2 0.1188 4 2 19
0.0943 0.0958
0.1095 6 0.0660 2 0.1835 1 1 11
0.0894 0.0945
0.1231 4 0.1228 1 0.1344 1 1 17
0.1038 0.0913
0.0623 0.0680 0.1245 Average

IV. DISCUSSION AND CONCLUSIONS


From table (5) above, the last row shows the average of layers is 5 this indicates that increasing in the number of
performance of the best 7 number of neurons using different hidden layers leads to better classification performance for
number of layers. This row shows that the increase of the Breast Cancer Data set.
number of hidden layers from 1 to 3 leads to a gradual
lowering to the performance of classification, then the All tables' show that the best performance is achieved
performance is increased when the number of layers is 4 and when the number of neurons starting from layer two and up is
5. Thus the number of hidden layers has a significant effect on small, typically 1 or 2.
the classification performance. And since the best performance So the final conclusion is that, to achieve better
and the best performance average is attained when the number classification performance of Breast Cancer Dataset using a

89 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 5, No. 12, 2014

FNN, with 5 hidden layers and the number of layer's neurons Journal of Computer Science & Systems Biology, JCSB, 2009, pp 247-
should be small, typically 1 or 2. 254, ISSN: 0974-7230
[10] M L Freedman and M L R Nierodzik, "Cancer and Age", 2007 by
REFERENCES Elsevier Inc.
[1] Andina and Pham, "Computational Intelligence: For Engineering and [11] Jun Zhang MS, Haobo Ma Md MS, “An Implementation of Guildford
Manufacturing", Springer, 2007, ISBN10 0-387-374507. Cytological Grading System to diagnose Breast Cancer Using Naïve
[2] Martin T. Hagan, Howard B. Demuth, Mark H. Beale "Neural Network Bayesian Classifier”, MEDINFO 2004, M. Fieschi et al. (Eds),
Design", ISBN: 0-9717321-0-8 Amsterdam: IOS Press
[3] Eric Kandel, James Schwartz, "Principles of Neural Science ", McGraw- [12] ShuxiangXu, Ling Chen. "A Novel Approach for Determining the
Hill, fourth edition, 2000, ISBN 0-8385-7701-6 Optimal Number of Hidden Layer Neurons for FNN’s and Its
[4] Karimi B, Menhaj M. B. and Saboori I, Multilayer feed forward neural Application in Data Mining", 5th International Conference on
networks for controlling decentralized large-scale non-affine nonlinear Information Technology and Applications, ICITA2008 ISBN: 978-0-
systems with guaranteed stability, International Journal of Innovative 9803267-2-7, pp 683 - 686
Computing, Information and Control, vol.6, no.11, pp.4825-4841, 2010. [13] Fujita O. Statistical estimation of the number of hidden units for feed
[5] Zare Nezhad B, and Aminian A. A multi-layer feed forward neural neural networks, Neural networks, 1998, pp: 851-859.
network model for accurate prediction of use gas sulfuric acid dew [14] Dario Baptista, Sandy Rodrigues, Fernando Morgado-Dias,
points in process industries, Applied Thermal Engineering, vol.30, no.6- "Performance comparison of ANN training algorithms for
7, pp.692-696, 2010 classification", Intelligent Signal Processing (WISP), 2013 IEEE.
[6] Tuba Kiyan, Tulay Yildirim, “Breast Cancer Diagnosis Using Statistical [15] Muhammad IbnIbrahimy, Md. Rezwanul Ahsan, Othman Omran
Neural Networks”, Journal of Electrical and Electronics Engineering, Khalifa, "Design and Optimization of Levenberg-Marquardt based
2004, vol. 4, Number 2, pp.1149-1153. Neural Network Classifier for EMG Signals to Identify Hand Motions",
[7] Sudhir D. Swarkar, Ashok Ghatol, Amol P. Pande, “Neural Network Measurement Science Review, Volume 13, No. 3, 2013.
Aided Breast Cancer Detection and Diagnosis Using Support Vector [16] Reynaldi, A. ; Lukas, S. ; Margaretha, H., "Backpropagation and
Machine” Proceedings of the International conference on Neural Levenberg-Marquardt Algorithm for raining Finite Element Neural
Networks, Cavtat, Croatia, June 12-14, 2006, pp. 158-163. Network", Computer Modeling and Simulation (EMS), Nov 2012 Sixth
[8] Paulin F, A.Santhakumaran A, "Classification of Breast cancer by UK Sim, ISBN:978-1-4673-4977-2.
comparing Back Propagation training algorithms", International Journal [17] Rudy Setiono, Huan Liu, “Neural-Network Feature Selector” IEEE
on Computer Science and Engineering, ISSN: 0975-3397 Vol. 3 No. 1 Transactions On Neural Networks, vol. 8, No. 3, May 1997, pg 664-662
Jan 2011 pp 327-332. [18] Wlodzislaw Duch, Rafal Adamczak and Krzysztof Grabczewski, “A
[9] Bindu Garg, M.M. Sufian Beg, A.Q. Ansari, "Optimizing Number of New methodology of Extraction, Optimization and Application of Crisp
Inputs to Classify Breast Cancer Using Artificial Neural Network" and Fuzzy Logic Rules” IEEE Transactions on Neural Networks, vol.
12, No. 2, March 2001, pp. 227-306.

90 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 5, No. 12, 2014

Use of Geographic Information System Tools in


Research on Neonatal Outcomes in a Maternity-
School in Belo Horizonte - Brazil
Juliano de S. Gaspar Thabata Sá
Pós-Graduação em Saúde da Mulher Hospital das Clínicas of UFMG
Faculty of Medicine of UFMG Belo Horizonte, Brasil
Belo Horizonte, Brasil
Renato F. N. Júnior
Zilma S. N. Reis Faculty of Production Engineering
Pós-Graduação em Saúde da Mulher Universidade Salgado de Oliveira
Faculty of Medicine of UFMG Belo Horizonte, Brasil
Belo Horizonte, Brasil
Raphael R. Gusmão
Marcelo S. Júnior Faculty of Medicine of UFMG
Pós-Graduação em Saúde da Mulher Belo Horizonte, Brasil
Faculty of Medicine of UFMG
Belo Horizonte, Brasil

Abstract—Aim: This study proposes to evaluate the spatial I. INTRODUCTION


distribution of the public obstetric care in the city of Belo
Horizonte. It will also correlate the primary care units (PCU) The decentralization of healthcare through regional and
with the immediate neonatal outcomes of a maternity-school of hierarchical connected health services, while sustaining the
Belo Horizonte, according to risk pregnancy and obstetric management of care in a primary level are guidelines from the
outcome. Method: Descriptive geographic-spatial research. This Brazilian Heath System(SUS) [1]. In the obstetric context,
study analyzed a cohort of 2956 newborn who received care at even though the labor occurs in a hospital environment, its
birth in maternity-school, Hospital das Clinicas (HC) of Federal success must be seen as a final process initiated in the primary
University of Minas Gerais (UFMG) between the January/2013 care. The gestation wellness initiate with a correct family
to July/2014. The gestational risk, the local of primary care unit planning, involves preconception evaluation, passes through
(PCU) of prenatal, immediate neonatal outcome was studied. The an early and effective prenatal assistance and, undeniably,
QGIS 2.4 open source software was used to generate thematic through proper parturitions conditions. All these steps depend
maps and analyses. Results: It was observed that among the 2083 on an accessible and well-scaled health care network with
births analyzed 1154 (55.4%) were classified as high risk for quality [2, 3].
maternal and 634 (30.4%) with poor neonatal outcome, also, that
has a concentration of women living in the northwest of the city About 2.9 million annual births are estimated in the
to officially refer to their childbirth mothers in the maternity- country. Among these, there are deaths of approximately 10.6
school. In cases of high risk pregnancy and perinatal in every 1000 conceptuses in the neonatal period and 64.8
complications referencing also occurs from practically all other women in each 100 thousand live births, this according to data
regions of the city. Discussion: The integration of hospital clinical from the year 2011 [4]. These are mostly considered
and administrative data with cartographic databases, through preventable deaths [5]. Such maternal and child indicators are
through this study, was able to make clear the patterns of still far behind the desired expectations and are reflections of
referencing for childbirth in maternity-school in high risk the level of economic development, culture and technology of
pregnancy. Despite the limitations of a descriptive study, the the country, and the current difficulties of prenatal and
analysis makes clear that the choice of place of childbirth,
delivery care in our health services [6, 7].
exceeds the matters set out in government planning of emergency
obstetric referencing by sanitary districts. Efforts have been undertaken in the sense of better
understand the mechanisms that lead to poor maternal and
Keywords—Geographic Information Systems (GIS); Fetal neonatal results. These will be studied in a special manner as
Malformation; Health Indicators; Obstetrics Result; Primary Care to high-risk pregnancy, and systematized care for pregnant
Unit (PCU); Public Health women in labor [8].

91 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 5, No. 12, 2014

A. Geographical Information Systems which data such as gestational risk classification according to
The geography plays a key role in almost every decision the SUS criteria[8], the local of primary care unit (PCU) of
are made. The choice of locations, the point of market prenatal, immediate neonatal outcome of interest were
segments, planning distribution networks, responding to extracted.
emergencies, or redrawing the limits of countries, all of these Perinatal death, fifth minute Apgar score less than seven,
problems involve questions of geography. The spatial birth weight <2500g, premature delivery (less than 32 weeks)
characteristics, such as topography and geographical or requiring admission to the neonatal unit were grouped as
dispersion of the population, are factors in determining poor neonatal outcome.
equitable distribution of resources [9].
Data from the Municipal Department of Health of Belo
The Geographic Information Systems (GIS) provide Horizonte (SMSA-BH) (SM-Saúde-BH, 2014): the amount of
related data from a spatial context and their respective maternity and PCU (Primary Care Units) existing in the city,
placements. This technology allows, in a simple way, to their respective addresses. Furthermore, the service capacity
visualize data with different degrees of complexity on a map. of each PCU was estimated from the amount of family health
This often provides a useful way to reveal spatial and temporal teams. The data refers to the year 2014 and defines the offer of
relations between data. Combining data and applying some public obstetric care in the city, including seven maternities
analytic rules, it is possible to create a pattern in order to help and 148 PCU.
answer the question previously asked [10]
C. GIS Software
Researchers, and public health professionals responsible
for setting policy, and others can use GIS to better understand The QGIS 2.4 open source software was used to generate
geographic relationships. These relationships affect health thematic maps. Maternities, PCU and place of residence of the
outcomes, risks to public health, transmission of disease, women were geographically mapped using the official
access to health care and other public health concerns [11]. cartographic design of the city of Belo Horizonte.

Many are simple and functional GIS to explore the


distances between health resources and population, bringing
great benefits. Thus, questions such as: How far is the nearest
hospital for a population, or where is the nearest institution
for blood donation, can be easily answered avoiding
embarrassment.
The georeferenced displacement of women seeking
obstetric care constitutes an important source of data on
quality of care. Information about pregnancy and childbirth Fig. 1. GIS layers defined Legend: PCU - Primary Care Unit
are important in assessing the quality of this assistance as well
as provide efficient mechanisms for continuous monitoring of Independent layers were designed with the specific
performance. This approach, coupled with the reality, lies in features selected in this analysis, as well as those with
the movement patterns that may allow health professionals maternal and neonatal outcomes (Fig 1).
and managers to monitor the quality, and the course of actions D. Ethical aspects
for improvement of care. This will contribute to the targeting
strategies that are necessary to improve mother and child The present study was approved by the Ethics Committee
health indicators [12]. for Research of the UFMG under the registration number
(Brazil Platform: CAE 0550.4612.9.0000.5149) all human
B. Aim research principles were respected. The proposition was
This study proposes to evaluate the spatial distribution of presented and supported by the Hospital das Clinicas of the
the public obstetric care in the city of Belo Horizonte. It will UFMG and Health Informatics Center of the Medicine School,
also correlate the primary care units (PCU) with the UFMG.
immediate neonatal outcomes of a maternity-school of Belo
III. RESULTS
Horizonte, according to risk pregnancy and obstetric outcome.
A. Data Quality Analysis
II. METHODS
Official Data, referring to the year 2010, indicates a
A. Study design prenatal coverage of 98.9%, where 99.7% of the 31.147
Descriptive geographic-spatial research. This study annual births were performed in hospitals [14]. In this period
analyzes a cohort of 2956 newborn who received care at birth occurred a total of 2956 births in the Hospital das Clinicas. Of
in maternity-school, Hospital das Clinicas (HC) of Federal these, 366 (12.4 %) were excluded from this study because
University of Minas Gerais (UFMG) between the they were women living in other cities. This certainly
January/2013 to July/2014. evidentiates the role of this unit as a reference for delivery in
this community. Seventy eight (78) records with women made
B. Variables their prenatal assistance in private network, 397 records
A system of obstetric information (SISMater®) [13] in without the indication of the PCU in who carried out the pre-

This project was financial supported by the:


CAPES - Coordenação de Aperfeiçoamento de Pessoal de Nível Superior
FAPEMIG - Fundação de Amparo a Pesquisa de Minas Gerais (APQ-00879-12)
92 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 5, No. 12, 2014

natal and 32 records that women who did not have prenatal
assistance, were also excluded (Table 1).

TABLE I. DATA QUALITY ANALISES

Data description N %
Prenatal in maternity-school from HC 1113 37.7
Prenatal from PCU in the city 970 32.8
Without prenatal information 397 13.4
Prenatal realized in other city 366 12.4
Prenatal realized in private network 78 2.6
Without prenatal 32 1.1
Total 2956 100.0

B. Patient Health
The city has about 2.67 million people [14]. The spatial
distribution of public obstetric care was plotted (Fig. 2). It is
composed of 148 PCU (Fig. 3), seven maternities, amply
distributed throughout the city. In this figure the influence area
of maternity-school of Hospital das Clinicas was highlighted
in red.

Fig. 3. Primary Care Unit (PCU) distribution by the size of potential service
(number of profesionals)

The spatial relationships between the place of PCU where


the mothers made the prenatal of the 2083 newborns in this
city, were presented in Fig. 4.

Fig. 2. Maternity Units and PCU influence area


Fig. 4. Distribution of births with prenatal performed on PCU of women that
Among the 2083 births analyzed 1154 (55.4%) were had assistance in maternity-school, UFMG
classified as high risk for maternal and/or neonatal
complications. As for the place of the prenatal care, only 1113 There was a concentration in the northwest region of the
(53.4%) made up in your own university service. city, because it focuses where the PCU that officially
reference their patient to births in the motherhood studied.

93 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 5, No. 12, 2014

Among the 363 births with low weight at birth (< 2500g) and
237 premature newborns. In Table 2 are summarized neonatal
outcome of births evaluated.

TABLE II. NEONATAL OUTCOME IN MATERNITY OF HOSPITAL DAS


CLINICAS, UFMG (N=2083)
Poor neonatal outcome caracteristics N %
Low weight at birth (<2500g) 363 17.4
ICU neonatal stay 360 17.2
Premature delivery 237 11.3
Apgar <7 (5o minute) 115 5.5
Poor neonatal outcome 634

However, note that this also gives referencing from


practically all other regions of the city, especially for
pregnancies at high risk for maternal and neonatal morbidities
(Fig. 5).

Fig. 6. Spatial distribution of poor neonatal outcome of women that had


assistance in maternity-school, UFMG

Fig. 5. Spatial distribution of gestational risk of women that had assistance


in maternity-school, UFMG by PCU

About a quarter of the analyzed population had some type Fig. 7. Percentage distribution of births with poor neonatal outcome by PCU
of women that had assistance in maternity-school, UFMG by PCU
of neonatal complications. The spatial distribution of poor
neonatal outcome (Fig. 6) takes place throughout the city, with
The percentage distribution of births with poor neonatal
some areas of concentration. One is the Northwest region
outcome and the spatial relationships between the places of
itself, considered coverage for maternity-school. Others to the
PCU where the mothers made the prenatal were plotted in
north, in the surroundings of large public hospitals also
Figure 7. It is relevant to highlight the lesser importance of
assisting childbirth. Noteworthy is the large number of women
maternity-school as a reference for women living in the South-
from Eastern region and the territorial limits of the city to the
central region. In these places there is also a smaller supply of
North, East and Northwest.
public health services, coincident with its feature of

94 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 5, No. 12, 2014

commercial and high socioeconomic standard. The prenatal care [19]. In the case of Belo Horizonte, the reasons
surroundings of maternity-school, called the Health Campus, for displacement exceeded the logic of proximity access or
is the site of a high concentration of health facilities providing even municipal planning referencing of obstetric emergencies.
care in the private sector. It also reflected the search for technological resources and
expertise of a university hospital to offer a better approach to
IV. DISCUSSION risk cases.
The form of spatial organization of the provision of health The guarantee of care by level of attention and its timely
services in this city is managed by the local public access from a rating of adequate risk pregnancy, and a system
administration. This reflects the structure of the levels of of referencing and transportation to mother and child, are
assistance for health care, as in the model of the Family Health fundamental effective strategies. These seek reduction of
Program, being implemented in Brazil since 1992 [15, 16]. maternal and neonatal morbidity or mortality [1, 20]. Expected
The city is divided into nine sanitary disctricts corresponding to have contributed to a model of analysis that can support
to the regional administration units and the administrative- more effective strategies for maternal and child care.
welfare organization of the public health service [17]. At
Northwest Health District are 10 of the 12 PCU that reference V. CONCLUSION
pregnant women priority for maternity-school. This analysis
confirms the role of this service as an important center for The integration of clinical and administrative data with
emergency obstetric care for women living in this region. cartographic base of the city, through this study, was able to
make clear the patterns of referencing for childbirth in
Conversely, women with high-risk pregnancies indistinctly maternity-school in high risk pregnancy. Despite the
come from all nine sanitaries disctricts. Many of the highly limitations of a descriptive study the analysis makes clear that
complex features offered by this university reference center the choice of place of childbirth exceeds the matters set out in
are unique to the state of Minas Gerais, such as Fetal Medicine government planning of emergency obstetric referencing by
care. There is a significant prevalence of fetal anomalies sanitary districts.
focused on childbirth in this unit, reaching 9.6% of births and
high-risk obstetrical cases in general [13]. Other studies show ACKNOWLEDGMENT
that in Brazil, there are many serious pregnancy complications The researchers acknowledge the support given by the
during childbirth. These are associated with nearmiss and Perinatal Commission of Belo Horizonte.
maternal death, as hemorrhage, sepsis, and obstructed labor.
REFERENCES
Their prognosis is defined by technological resources and
[1] Victora Cg Fau - Aquino, E.M.L., et al., Maternal and child health in
quality of services offered by tertiary and quaternary hospitals Brazil: progress and challenges. The Lancet. 377(9780): p. 1863-1876.
[18] . This justifies our findings in clear demand for quality 2001.
care in high risk situations. Even living within close proximity [2] Brasil, M.d. Saúde, and D.d.A.P. Estratégicas, Política Nacional de
to other public hospitals, many women sought the solution of Atenção Integral à Saúde da Mulher Princípios e Diretrizes, M.d. Saúde,
health issues of high complexity in the maternity-school, Editor Ministério da Saúde: Brasília. p. 82. 2007.
rather than a larger displacement. [3] Brasil and M.d. Saúde, Manual dos comitês de prevenção do óbito
infantil e fetal, in Normas e Manuais Técnicos, M.d. Saúde, Editor
One of the aims of this study is the quest for recognition Ministério da Saúde do Brasil: Brasília. 2004.
and analysis of geographic and logistical access routes for [4] MS-Brazil. DATASUS: Sistema de Informação em Saúde do SUS. 2014
women with high gestational risk cases. Also to identify the 13/01/2014]; Available from: http://www2.datasus.gov.br/DATASUS.
resources of an appropriate tertiary and quaternary care unit [5] Brasil, Manual dos comitês de mortalidade materna, S.d.A.à. Saúde,
complex, where public risk is. By mapping geographically the Editor: Brasília. p. 104. 2007.
maternity-school and the origin of the displacements of the [6] Reis, Z.S.N., et al., Análise de indicadores da saúde materno-infantil:
most serious perinatal cases in search of this service, found paralelos entre Portugal e Brasil. Revista Brasileira de Ginecologia e
Obstetrícia. 33: p. 234-239. 2011.
influence of university service throughout the country. This
[7] MS-Brasil. Gestantes receberão auxílio financeiro para deslocamento.
also serves women living in other nearby cities [13], but it was 2012 3/8/2012 [cited 2014 13/01/2014]; Available from:
not the object of this study. http://dab.saude.gov.br/portaldab/noticias.php?conteudo=_&cod=1486.
A limitation of this approach is the absence of markers of [8] -
socioeconomic disadvantages and population density between Ministério da Saúde do Brasil: Brasília. p. 302. 2010.
the layers used in the maps. Certainly the future exploration of
[9] Leitner, H., et al., Models for making GIS available to community
spatial relationships of these factors with the obstetric organizations: dimensions of difference and appropriateness. In .
outcome will still result in relevant information. However, this Community Participation and Geographic Information Systems: p. 37-
is an unprecedented review on this community and context. It 52. 2002.
makes clear the potential of integration between hospital and [10] Maged, N. and B. Kamel, Towards evidence-based, GIS-driven national
government information for understanding the obstetric spatial health information infrastructure and surveillance services in the
outcomes through georeferencing techniques. It is known that United Kingdom. 2004.
the accessibility to health services may be an important [11] CDC. Centers for Disease Control and Prevention. 2013 [cited 2013
23/10/2013]; Available from: http://www.cdc.gov/.
outcome of reorganizing care to pregnant women and
[12] MS-Brasil, Política Nacional de Atenção Integral à Saúde da Mulher
newborns. A study conducted in the city of Recife (Brazil) Princípios e Diretrizes. Ministério da Saúde: Secretaria de Atenção à
showed that proximity of residence or work has been the most Saúde Departamento de Ações Programáticas Estratégicas. Série C.
important element in choosing the health unit for performing Projetos, Programas e Relatórios (1 ed). 2009.

95 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 5, No. 12, 2014

[13] Gaspar, J., et al., Maternal and Neonatal Healthcare Information System: [17] SM-Saúde-BH. Áreas de Abrangência dos Centros de Saúde. 2014
Development of an Obstetric Electronic Health Record and Healthcare [cited 2014; Available from:
Indicators Dashboard, in Information Technology in Bio- and Medical http://gestaocompartilhada.pbh.gov.br/estrutura-territorial/areas-de-
Informatics, M. Bursa, S. Khuri, and M.E. Renda, Editors. Springer abrangencia-dos-centros-de-saude.
Berlin Heidelberg. p. 62-76. 2013. [18] Beale, T., et al., OpenEHR architecture overview. The OpenEHR
[14] Beale, T. and S. Heard, openEHR Architecture–Architecture Overview, Foundation. 2006.
2008. 2010. [19] Kalra, D., Electronic health record standards. Yearb Med Inform: p.
[15] Victora, C.G., et al., Health conditions and health-policy innovations in 136-144. 2006.
Brazil: the way forward. The Lancet. 377(9782): p. 2042-2053. 2011. [20] MS-Brasil, Portaria Nº 1.459, 24/06/2011 - Institui, no âmbito do
[16] Paim, J., et al., The Brazilian health system: history, advances, and Sistema Único de Saúde - SUS - a Rede Cegonha., M.d.S.d. Brasil,
challenges. The Lancet. 377(9779): p. 1778-1797. 2011. Editor Ministério da Saúde: Brasília. 2011.

96 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 5, No. 12, 2014

A Grammatical Inference Sequential Mining


Algorithm for Protein Fold Recognition
Taysir Hassan A. Soliman Ahmed Sharaf Eldin
Associate Professor Professor
Information Systems Dept. Information Systems Dept.
Faculty of Computers & Information Faculty of Computers & Information
Assiut University, Egypt Helwan University, Egypt

Marwa M. Ghareeb Mohammed E. Marie


Lecturer Lecturer
Information Systems Dept. Information Systems Dept.
Faculty of Computers & Information Faculty of Computers & Information
Modern Academy, Egypt Helwan University, Egypt

Abstract—Protein fold recognition plays an important role in Other research has been performed using Monte Carlo methods
computational protein analysis since it can determine protein [11]. In addition, many researchers used parallel evolutionary
function whose structure is unknown. In this paper, a Classified algorithms for protein fold recognition, such as parallel EST,
Sequential Pattern mining technique for Protein Fold probabilistic roadmap for motion planning, pRNAPredict for
Recognition (CSPF) is proposed. CSPF technique consists of two RNA secondary structure [12-16]. However, although
main phases: the sequential mining pattern phase and the fold significant improvement has been made, the accuracy of the
recognition phase. In the sequential mining pattern phase, Mix existing methods remains low and there is a need for new
& Test algorithm is developed based on Grammatical Inference, methods contributing to the field of fold recognition.
which is used as a training phase. Mix & Test algorithm
minimizes I/O costs by one database scan, discovers subsequence Sequential mining algorithms have been proposed to
combinations directly from sequences in memory without predict protein folds. The objective of sequential pattern
searching the whole sequence file, has no database projection, mining is to discover interesting sequential patterns in a
handles gaps, and works with variant length sequences without sequence database. It is one of the essential data mining tasks
having to align them. In addition, a parallelized version of Mix & widely used in many applications, including customer purchase
Test algorithm is applied to speed up Mix & Test algorithm pattern analysis and biological data sequences [17-22], etc.
performance. In the fold recognition phase, unknown protein Many research have been performed to efficient sequential
folds are predicted via a proposed testing function. To test the
pattern mining, such as [23-25], closed and maximal
performance, 36 SCOP protein folds are used, where the
sequential pattern mining [26-29], constraint-based sequential
accuracy rate is 75.84% for training data and 59.7% for testing
data.
pattern mining [30-32] approximate sequential pattern mining
[33], sequential pattern mining in multiple data sources [34],
Keywords—Data mining; grammatical inference; sequential sequential pattern mining in noisy data [35], incremental
mining; protein fold recognition mining of sequential patterns [36], and time-interval weighted
sequential pattern mining [37]. Two of the general sequential
I. INTRODUCTION mining algorithms are SPADE [24] and PrefixSpan [23], which
Protein fold recognition is an important step towards are more efficient than others in terms of processing time.
understanding protein three-dimensional structures and their SPADE is one of the vertical-format based algorithms and
biological functions. Fold recognition techniques do not uses equivalence classes in the mining process. PrefixSpan is
require similar sequences in the protein databank, just similar one of the pattern-growth approaches. It recursively projects a
folds. Successful approaches have been applied to protein fold sequence database into a set of smaller projected sequence
recognition [1]. For example, various researchers used Neural databases and grows sequential patterns in each projected
networks to predict protein folds, such as GeneThreader [2], database by exploring only the locally frequent fragments.
TUNE (Threading Using Neural nEtwork) [3], neural cSPADE [38] algorithm is a straightforward extension of
networks with tailored early-stopping [4], Bayesian Networks SPADE algorithm. The only difference is the involvement of
[5], structural- pattern based methods [6], and Genetic constraints in the cSPADE. These constraints include length,
Algorithms [7,8]. Examples of using Support Vector Machines width, and duration limitations on the sequences, item
(SVM) have been illustrated as follows: directly predict the constraints, event constraints, and incorporating class
alignment accuracy of a sequence template alignment [9] and a information. In addition, one of the SPADE based algorithm
combined technique of Support Vector Machine (SVM) called SPAM (Sequential PAttern Mining) [39] has been
classifier with Regularized Discriminant Analysis (RDA) [10]. proposed. It integrates the ideas of GSP, SPADE, and

97 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 5, No. 12, 2014

FreeSpan and combines a vertical bitmap representation of the 1) Mix Strategy:


database with efficient support counting. Problem Definition: Given a sequences file S that contains a
One of the promising areas is Formal Language Theory and set of sequences S= {s1, s2, ..., sm} and a set of items I = { i1,
Grammatical Inference (GI), which is playing important role in i2, …, in} that may appear in any sequence (here, a set of
the development of new methods to process biological data amino acids), where m is the number of sequences in a file
[40]. Many works propose GI techniques to tackle and n is the number of amino acids. A sequence sj= <i1, ...,in>
bioinformatics tasks, such as secondary structure identification , where i1 is the first item in the sequence and in is the last item
[41], protein motifs detection [42], and optimal consensus in the sequence. Let P is a subsequence that is derived from s j,
sequence discovery [43]. In this paper, GI is used as the Pt is the current generated subsequence. Pt-1 is the previous
backbone of the sequential pattern mining algorithm, which has generated subsequence. The first generated subsequence will
achieved faster and higher performance accuracy than other be:
sequential pattern mining algorithms for protein fold
recognition.
P1(sj) = in-1 & in(1)
In this paper, we introduce a Classified Sequential Pattern
mining technique for Protein Fold Recognition (CSPF). CSPF The generated subsequence will be:
consists of two main phases: 1) Sequential pattern mining and
2) fold recognition. It handles gap constraints, uses data
Pt(sj)= in-t & Pt-1(sj) (2)
parallelization, and performs incremental updating. CSPF has
shown efficient results when applied to 36 SCOP protein folds. 1. Read New Protein Sequences
This paper is organized as follows: section 2 explains the 2. Apply Mix Strategy to generate sequential
proposed CSPF technique. Section 3 describes datasets used combination
and the performance study. Finally, section 4 gives the 3. If New Combination then
conclusions and future work. Add new combination to Arraylist with
support =1
II. METHODS Else
Increase it support by 1
CSPF technique consists of two main phases: the sequential 4. If End of sequences file then
pattern mining phase and the fold recognition phase. In the If stopping criterion is reached
sequential pattern mining phase, Mix & Test algorithm is (No_of Max gaps) then
developed, which is used as a training phase. In the fold If Combinations’ support >=Minsup then
Output frequent Sequential patterns
recognition phase, unknown protein folds are predicted via a Else
proposed testing function. Our work is close to the sequential Output Infrequent Sequential patterns
pattern mining suggested in [13]. However, this work depends Else
on a new algorithm for sequential pattern mining, based on GOTO step 3
grammatical inference. In addition, it employs parallel Else
GOTO step 2
sequential pattern mining and incremental updating.
A. Phase I: Sequential Pattern Mining: Fig. 1. Mix & Test Algorithm Flowchart
During this phase, Mix & Test algorithm is developed in
order to mine sequential patterns for each fold, based on Sequential combinations Generation "No-Gap
Grammatical Inference. The key advantages of Mix &Test combinations"
algorithm are minimizing I/O costs via one database scan, Mix strategy will first generate all "no gap combinations"
discovering combinations directly from sequences in-memory list. It starts by reading the first sequence of protein sequences
without searching the whole sequences file, no database file and generates all possible sequential combinations of it.
projection, handling gaps, and working with variant length Mix strategy inserts the generated combination to the "no gap
sequences without having to align them. In addition, Mix & combinations" list with support equals to 1. Mix strategy will
Test algorithm supports incremental updating, where it does loop through new generated P to generate all possible
not prune infrequent patterns and count the support of them combinations of it, using a removing procedure. This
during the mining steps. Mix & Test algorithm acts procedure removes the last item of the last generated
iteratively. First, it generates a list of no gap sequential combination to get a new combination from current P. It will
combinations, which will serve as the seed for the coming stop generate Pt when t equals to number of items in the
generation if there is a gap value specified. If no gap is sequence n. An example of generated sequential combinations
specified, this list will be evaluated by the testing strategy with of “No-gap combinations” is illustrated in Table I, given
the specified minimum support threshold. Thus, this list will original sequence MAKNNGCDP. After generating all
obtain the frequent and infrequent lists. If the gap value is possible sequential combinations from the first sequence of the
specified, Mix &Test will loop to the combinations generation protein sequences file. It will start to read the second sequence
step and will use the combinations list obtained from the and go through the previous steps and generate all new
previous step to construct new combinations list with a gap by combinations. If the new generated combination is previously
following steps of Mix &Test algorithm’s grammar. composed, its support will be incremented by one; otherwise, it
The steps of the algorithm are shown in Fig. 1.

98 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 5, No. 12, 2014

will be inserted to "no gap combinations" list with support Definition 2: Given C as a "no gap combinations" list. Ci is
equals to 1, as clarified in Fig. 1. a no gap combination. Let Q be the gapped combination list
generated by Crisscross procedure, as follows:
Gapped Sequential combinations Generation
If there is a gap value specified, the "no gap combinations" Qr(Ci(Sj)) = Ci – (ir+1 & ir+3 & ir+5 & ir+7 … in) (4)
list will be used to generate "one gap combinations" list, which
will be used to generate "two gaps combinations" list, and so Where Qr(Ci(Sj)) is the r combination generated by
on. Mix strategy will use two procedures to generate all Crisscross procedure from no gap combination Ci, and ir+1 is
possible gapped sequential combinations: Ladder and the item i with the position r+1 in Ci combination. The
CrissCross procedures. concatenation part of the function will stop when n equals to or
First, the Ladder procedure reads each combination in "no greater than the number of items in Ci.
gap combinations" list and loops through it by inserting one By applying this procedure in the last example,
gap at a time starting from the second character position shifted MAKNNGCDP no-gap combination will produce:
right in each loop until reaching the last character of the M_K_N_C_P, MA_N_G_D, MAK_N_C_P, MAKN_G_D,
combination. Then, it will start again to read the next no gap MAKNN_C_P, MAKNNG_D, and MAKNNGC_P. Notice
combination and apply the previous steps on it. that all these derivative combinations by the two procedures
will take the same support of the parent no gap combination
TABLE I. LIST OF GENERATED SEQUENTIAL COMBINATIONS “NO-GAP which they are derived from it. Mix strategy will stop
COMBINATIONS”
generating new combinations when the number of sequences in
SUBSEQUENCE LIST OF GENERATED COMBINATIONS protein sequences file. The final result from applying the Mix
strategy will be a list of all combinations derived from all
P1 DP combinations lists.
P2 CD, CDP
P3 GC, GCD, GCDP 2) Test strategy:
P4 NG, NGC, NGCD, NGCDP The Test strategy will filter final combinations list, which
P5 NN, NNG, NNGC, NNGCD, NNGCDP contains all no-gap and gapped combinations to distinguish
P6 KN, KNN, KNNG, KNNGC, KNNGCD, frequent and infrequent patterns, according to user-specified
KNNGCDP
support. However, infrequent patterns will not be discarded
P7 AK, AKN, AKNN, AKNNG, AKNNGC, because incremental updating will be performed later on.
AKNNGCD, AKNNGCDP
P8 MA, MAK, MAKN, MAKNN, The most time consuming step in the Mix&Test algorithm
MAKNNG, MAKNNGC, MAKNNGCD, is updating the combinations list, where a search is required in
MAKNNGCDP order to ensure if the generated combination is a new one to
insert it or an old one to update its support. Thus, the
Definition 1: Given C as a "no gap combinations" list. Ci is combinations list may become very large. Therefore, a
a no gap combination. Let L be the gapped combination list lexicographic prefix tree of lists is suggested, where each list
generated by Ladder procedure, as follows: contains all combinations with the same prefix. For example,
let P = {p1, p2, … , pn} be a set of lists (here n= 20 Amino
Ly(Ci(Sj)) = Ci – iy+1 (3) Acids). Each pi represents a list of all combinations with a
prefix i. For example, if i = M, the list P m can contain
Where Ly(Ci(Sj)) is the y combination generated by Ladder combinations, such as MV, MVV, MTV, MNKLSV. After
procedure from no gap combination Ci, and iy+1 is the item i Mix strategy generates the new combination, the first character
with the position y+1 in Ci combination. of this combination is checked to determine which list to be
Consider the first combination in the "No gap inserted in. So, instead of having one big list, we will have pn
combinations" list is MAKNNGCDP, applying this procedure, lists, this shrinks time T to find or insert combination to T/n. In
we will obtain these one gap combinations: M_KNNGCDP, order to increase the speed of computing and minimize the
MA_NNGCDP, MAK_NGCDP, MAKN_GCDP, time required to generate the combinations in Mix strategy,
MAKNN_CDP, MAKNNG_DP, and MAKNNGC_P. Note especially with the large number of files and the rapid
that MAK_NGCDP is equivalent to MAKN_GCDP, so that incoming rates, Parallel Mix strategy (PMix) is proposed.
they are treated as one combination and inserted only once in PMix uses horizontal data parallelization, where the data are
"one gap combinations" list as MAKNGCDP. split into chunks in the memory for the task. These data
chunks will be distributed on PMix threads. Each thread will
Second, the Crisscross procedure generates the rest of apply Mix strategy to generate the combinations of candidate
possible gapped sequential combinations of "one gap patterns of this data chunk. After all threads finish their work, a
combinations" list. It reads each combination in "no gap combination integrator module will integrate all combinations
combinations" list, looping through it and inserting one gap generated from the threads into one final combinations list. The
between each character of combination's characters. It starts final combinations list is used by combinations evaluator
from the second character's position shifted right one character module, which applies test strategy to get frequent and
position in each loop. infrequent patterns.

99 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 5, No. 12, 2014

3) Incremental updating of pattern, K represents Number of patterns with the same


CSPF saves and records the sequential patterns of each length, S is the number of extracted sequential patterns for a
fold, which are generated from the training phase. However, fold, and W is the weight of the fold.
increasing the speed of processing, especially with large
volumes of data and high data rates, is highly required.
Existing incremental updating algorithms are highly based on
the availability of main memory. As a result, the use of In-
Memory relational databases is proposed, where TimesTen
Oracle database management system is applied. TimesTen is
an In-Memory DBMS technology, which provides very fast
data access time because all its data will reside in physical
memory (RAM) during run time. TimesTen provides
applications with short, consistent response times and very
high throughput required by applications with database-
intensive workloads.
Incremental updating handles two cases: inserting new data
and deleting old data. First, Insert module, as shown in Fig. 2,
deals with new protein files to existing fold trial, the Mix
strategy is applied to obtain the combination patterns of these
files. These patterns are sent to database and added to the
previously obtained frequent sequential patterns. Updated
patterns can be classified into four cases: 1) Patterns that were Fig. 2. Insert Module
frequent in the old database and become infrequent in the new
database, 2) Patterns that were frequent in the old database and III. APPLICATION
still frequent in the new database, 3) Patterns that were The CSPF technique is evaluated using different
infrequent in the old database and become frequent in the new parameters, such as different support thresholds, number of
database, and 4) Patterns that were infrequent in the old sequences, memory consumption, and number of items per
database and still infrequent in the new database. Second, the sequence. CSFP is trained and tested by a specific set of
Delete module deals with deleted sequences from the original selected folds from the Structural Classification of Proteins
database, which yields an inconsistent state with respect to the (SCOP) database1. The ASTRAL SCOP 1.75B dataset
same specified minimum support threshold. The Delete updated on 25-4-2013 is selected, where no proteins with more
procedure is similar to the Insert procedure. When deleting than 40% identity between them are included. The ASTRAL
some protein sequences from existing fold trial, the obtained SCOP 1.75B dataset release has 49,757 PDB entries and
lists of frequent and infrequent patterns are affected. Delete 136,776 Domains. For each fold in this set, a corresponding
module provides two ways for deletion either by deleting files set of at least 30 protein members is obtained from Protein
directly by specifying their names or by a range of time to Data Bank (PDB) [44], which is a worldwide archive of
delete files in between. structural data of biological macromolecules. The protein
sequences extracted from this release are used to validate the
B. Phase II: Protein Fold Recognition results of the proposed model. Two third of this dataset is used
The objective of the fold recognition phase is to classify in the training phase to establish features set for each fold and
unknown protein folds. In addition, an incremental updating one third is used in the test data to check validity of the
module is used for maintaining the underlying database. proposed model. The algorithms are developed using Java
language with NetBeans IDE 7.2 as the Java execution
1) Weight Function for Protein Fold Recognition
environment. The algorithms are tested on an Intel Core™ i5
The proposed weight function classifies the unknown
2.50 GHz with 6 GB of main memory. The operating system
protein by matching the extracted sequential patterns of each
used is Windows 7.
fold with the coming protein sequence. A weight for each fold
with respect to the unknown protein is calculated. The higher The following performance evaluation tests are achieved:
the number of matched patterns is found, the higher the weight 1) For no gap mix strategy: a) Comparison of Mix & Test,
for the fold and the higher the probability of it to be selected as PMix, and SPAM in terms of varied number of sequences, b)
the recognized fold. However, there are very important aspects Comparison of Mix& Test, PMix, SPAM, and PrefixSpan in
that have to be considered: 1) The length of the matched case of varied support threshold, and c) Comparison of Mix&
sequential patterns. The more matched frequent patterns with Test, PMix, SPAM, and PrefixSpan in case of changing
long length are reached, the higher the accuracy of the fold number of items per sequence. 2) For gapped mix strategy:
classification. 2) Two folds having the same number of Comparison of Mix & Test, and cSPADE algorithms
sequential patterns. The proposed Weight Fold Function is: according to the changes in maximum gap value. 3)
Incremental Updating, 4) Memory consumption, and 5) Fold
Wf= N/ S + ∑ (Ki * (Li / Mi )) (5)
recognition phase: a comparison between the proposed method
Where N is Number of matched Patterns, M is the and SAM, which is widely used as a benchmark in fold
Maximum length of extracted patterns for the fold, L is Length 1
http://scop.berkeley.edu/

100 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 5, No. 12, 2014

recognition [39,45]. However, SAM requires higher 420


M&T
computational effort during training, since it employs the 390
PMix
Baum–Welch algorithm for training the model, which is an 360
330 PrefixSpan
iterative procedure. 300
SPAM
270

A. Performance analysis of no gap mix strategy 240


210
1) Number of sequences Test: 180

In this study, we measure the performance of Mix & Test, 150


120
PMix, and SPAM algorithms according to the change in 90

number of sequences. Fig. 3 shows the performance results 60


30
derived from Mix &Test, PMix, and SPAM having data

Time (m)
0
0 5% 10% 15% 20% 25% 30% 35% 40% 45% 50% Support Threshold
ranges from 100, 000 to 900,000 sequences. Fig. 4 illustrates
the performance results derived from Mix&Test, PMix, and
Fig. 5. Mix & Test, PMix, PrefixSpan, and SPAM Comparisons with varied
SPAM having data ranges from 1,000,000 to 5,000,000 support threshold (25,000 Sequences)
sequences. In both figures, Mix &Test and PMix outperform
SPAM, where time taken by them is much smaller than time 450

taken by SPAM. In addition, PMix outperforms both Mix & 420 M&T
Test and SPAM algorithms because of parallelization step. 390
PMix
360

330
45 PrefixSpan
M&T 300
40
35 PMix 270 SPAM
240
30
SPAM 210
25
180
20
150
15 120
10 90

5 60
Time (m)

0 30
Time (m)

0 100 200 300 400 500 600 700 800 900 Number of sequences 0
Support Threshold
000 000 000 000 000 000 000 000 000 0 5% 10% 15% 20% 25% 30% 35% 40% 45% 50%

Fig. 6. Mix & Test, PMix, PrefixSpan, and SPAM Comparisons with varied
Fig. 3. M&T, PMix vs. SPAM having data ranges from 100,000 to 900,000 support threshold (50,000 Sequences)
sequences
3) Number of Items per Sequence
2) Minimum Support Threshold test: Four tests are applied, having 180 and 300 items per
Fig. 5 and Fig. 6 show the processing time of Mix&Test sequence (ips) and variant support threshold, as shown in Fig.
and PMix versus PrefixSpan and SPAM at different values of 7(a,b), respectively . Each trial in each test of the experiment is
support threshold having the number of sequences equals represented by adding 5% to the support threshold value of the
25,000 and 50,000, respectively. For protein sequences data previous trial. Thus, the first trial with support threshold value
and with very low minimum support threshold, the equals to 5% and the last one with support threshold value
performance of PrefixSpan and SPAM take hours to process. equals to 50%. The execution time is measured in each trial.
On the other hand, Mix&Test and PMix take seconds and are The result of these tests shows the relationship between the
not affected with the change of minimum support threshold value of the support threshold and the processing time in
values. seconds according of the four algorithms: Mix& Test, PMix,
M&T PrefixSpan, and SPAM. As shown in Fig. 7(a,b), Mix & Test
300 and PMix are much faster than PrefixSpan and SPAM.
PMix
270 650
240 SPAM 600
210 550 M&T
180 500
PMix
150 450
400
120 350
PrefixSpan
90 300 SPAM
60 250
Time (m)

30 Number of sequences
200
0 150
0 1000 000 2000 000 3000 000 4000 000 5000 000 100
Time (m)

50 Support Threshold
0
Fig. 4. M&T, PMix vs. SPAM having data ranges from 1,000,000 to 0 5% 10% 15% 20% 25% 30% 35% 40% 45% 50%
5,000,000 sequences
Fig. 7. (a). M&T and PMix vs. PrefixSpan and SPAM under different support
threshold and 180 items per sequence

101 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 5, No. 12, 2014

800

720 7000
M&T
640 6000

560
PMix 5000
M&T (TT)
4000
480 PrefixSpan
400 3000
M&T (MySql)
SPAM
320
2000

1000
240

Time (s)
0
160 10 000 20 000 30 000 40 000 50 000

80
Number of sequences
Time (m)

Support Threshold
0
0 5% 10% 15% 20% 25% 30% 35% 40% 45% 50%
Fig. 9. Mix&Test(TT) and Mix&Test(MySql) under different Sequences File
volumes
Fig. 7. (b).M&T and PMix vs. PrefixSpan and SPAM under different
support threshold and 300 items per sequence D. Performance Analysis of Memory Consumption
To evaluate the memory consumption of Mix&Test and
B. Performance analysis of gapped mix strategy PMix are evaluated versus cSPADE under two aspects, which
In this case, the performance of Mix&Test and PMix versus are the different gap values and the variant number of
cSPADE algorithm is tested, according to the changes in sequences. Changing gap values, Mix& Test and PMix are
maximum gap value, as illustrated in Fig. 8. This minimum tested versus cSPADE algorithm by using sequences file with
support threshold equals to 35%. One can observe that the 30,000 sequences with minimum support threshold value
higher the gap value, the higher consumed time taken, having equals to 30%, as illustrated in Fig. 10. PMix consumes
Mix&Test and PMix algorithms outperform cSPADE in small memory greater than Mix&Test because it processes
gap values. In addition, PMix outperforms both Mix&Test and multithreads in the same time. Also, cSPADE consumes much
cSPADE. memory more than both Mix& Test and Pmix.
Time (s)

PMix
1000 1000

M&T PMix
800 800

cSPADE
600 600
M&T

400 400
cSPADE

200 200

Maximum Gap
0
Time (s)

0 Maximum Gap
1 2 3 4 5 1 2 3 4 5

Fig. 8. Mix&Test and PMix vs. cSPADE under different Maximum Gap Fig. 10. The memory consumption of M&T and PMix vs. cSPADE under
Values different gap values

C. Performance analysis of Incremental Updating Process E. Performance Analysis of Fold recognition Phase:
The Incremental updating module is implemented via two The fold recognition phase of CSPF technique is trained
different database management systems. The first is MySQL and tested by the dataset described previously [13]. In Table
DBMS with a conventional disk-resident database and the II, we compare the sensitivity of the CSPF to SPM sensitivity
other is the Oracle TimesTen database, as explained for fold recognition. Sensitivity of each model represents the
previously. The performance of Mix&Test(TimesTen) and number of proteins, which are classified successfully from the
Mix&Test(MySql) according to the change in number of whole proteins under evaluation.
sequences (in this case from 10,000 to 50,000 sequences) is
tested. In this case, a support threshold value equals to 20% CSPF reported an overall accuracy of training data equals
with no gap value is applied, as illustrated in Fig. 11. In to 75.84%, with MaxGap=0 and MinSup=20%, while the
addition, the performance result of Mix&Test(TT) outperforms overall accuracy of "SPM for FR" model is 59.7% with
Mix&Test(MySql). Mix&Test(TT) takes around 30 seconds to MaxGap=3 and MinSup=40%. A set of 804 protein
process 10,000 sequences file where M&T(MySql) takes experiments (test data set) are used to measure the accuracy of
around 200 seconds to process it. This is because Timesten the model with the test set. CSPF reported an overall accuracy
database is more efficient than MySql DBMS, where it offers a of testing data equals to 34.32%, as shown in Table III.
small, fast multithreaded, and transactional database engine
with in-memory and disk-based tables.

102 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 5, No. 12, 2014

TABLE II. SENSITIVITY FOR ALL FOLDS AND OVERALL ACCURACY OF


THE PROPOSED CSPF TECHNIQUE AND "SPM FOR FOLD RECOGNITION (FR)" Using the same test datasets and in order to compare the
efficiency of the proposed model, SAM model [16] is also
SPM for FR employed. A comparison of the results obtained by CPSF,
Fold index CSPF Sensitivity
Sensitivity
"SPM for FR" and SAM (E-values ranking) are presented in
(Proteins) (%) (Proteins) (%) Table IV.
a1 20/21 95.2 15/21 71.4
a3 20/20 100 17/20 85 CSPF outperforms the other two models, where it reports
a4 38/103 36.89 30/103 29.1 an overall accuracy of testing data equals to 34.32% while the
a24 28/28 100 28/28 100 overall accuracy of "SPM for FR" model was 24.9% and
a39 27/31 87.09 26/31 83 SAM’s overall accuracy was 29.4%. The classification results
a60 21/25 84 19/25 76 of the proposed method CSPF, and "SPM for FR" algorithm
a118 30/32 93.75 28/32 87.5 and SAM (E-values) of the test set are shown in Table IV.
Class A (total) 184/260 70.76 163/260 62.7
In terms of space complexity, for a sequence file with n as
b1 81/132 61.36 68/132 51.5 the number of sequences, and m as the number of items per
b2 20/20 100 19/20 95 sequence and number of items equals to 20 which is the 20
b18 21/21 100 20/21 95.2
amino acids, the space complexity of Mix&Test algorithm is
b29 22/24 91.6 21/24 87.5
O(20m+n). In terms of time complexity, the complexity of
b34 22/44 50 10/44 22.7
generating all the candidate patterns of Mix&Test with no gap
b40 26/61 42.6 25/61 41
is O(n2). The complexity of generating all the candidate
b47 25/25 100 24/25 96
patterns of Mix&Test with a gap m is O(n2)*m. The
b55 18/24 75 16/24 66
complexity of discovering the frequent patterns is O(N).
b82 22/28 78.5 20/28 71.4
b121 27/27 100 26/27 96.3 IV. CONCLUSIONS
Class B (total) 284/406 69.95 249/406 61.3
c1 82/143 57.34 16/143 11.2 In this work, we proposed a CSFP technique for protein
c2 88/91 96.70 85/91 93.4 fold recognition. This technique consisted of two main phases:
c3 20/22 90.9 22/22 100 sequential patterns extraction and protein fold recognition.
c23 49/58 84.4 30/58 51.7 Sequential patterns extraction phase introduced Mix & Test
c26 31/35 88.57 29/35 82.9 algorithm. Several experiments were conducted to assess the
c37 79/91 86.8 32/91 35.2 performance of Mix&Test and PMix. The performance of
c47 34/39 87.1 22/39 56.4 M&T and PMix algorithms were compared with PrefixSpan,
c55 31/31 100 30/31 96.8 SPAM and cSPADE algorithms.
c56 18/20 90 20/20 100 In addition, performance of CSFP fold recognition was
c66 36/40 90 27/40 67.5
compared with "SPM for FR" and SAM (E-values) models.
c67 30/31 96.77 31/31 100
CSPF outperformed "SPM for FR" and SAM (E-values)
c69 32/34 94.1 29/34 85.3
models with an overall accuracy for training data equals to
c94 22/23 95.6 19/23 82.6
75.84% and "SPM for FR" model was 59.7% for testing data.
Class C (total) 552/658 83.8 392/658 59.6
Future work of CSFP can be in several directions: utilizing
d15 39/44 88.6 21/44 47.7
optimization techniques to enhance the prediction results and
d17 18/20 90 14/20 70
applying high performance computing to provide very fast
d58 38/102 37.25 22/102 21.6
process over protein sequences databases. In addition, more
d144 21/23 91.3 22/23 95.7
protein sequences will be used.
Class D (total) 116/189 90.4 79/189 41.8
f23 20/25 80 16/25 64
Class F (total) 20/25 80 16/25 64
g3 62/68 91.1 60/68 88.2
Class G (total) 62/68 91.1 60/68 88.2
Overall 1218/1606 75.84 959/1606 59.7

103 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 5, No. 12, 2014

TABLE III. DETAILED SENSITIVITY RESULTS FOR ALL FOLDS UNDER


EVALUATION AND OVERALL ACCURACY OF THE PROPOSED CSPF MODEL IN TABLE IV. CLASSIFICATION RESULTS OF THE PROPOSED METHOD CSPF,
THE TEST SET "SPM FOR FR" ALGORITHM AND SAM (E-VALUES) IN THE TEST SET

CSPF CSPF SPM for


Fold index Sensitivity Sensitivity CSPF SAM(E-
FR
(Proteins) % Fold index Sensitivity values)
Sensitivity
a1 4/11 36.36 % Sensitivity%
%
a3 8/10 80 a1 36.36 81.8 18.2
a4 3/52 5.7 a3 80 60 20
a24 15/15 100 a4 5.7 3.8 28.8
a39 11/15 37.3 a24 100 6.7 33.3
a60 2/12 16.3 a39 37.3 87.7 66.7
a118 3/16 18.75 a60 16.3 16.7 16.7
Class A (total) 46/131 35.11 a118 18.75 0 37.5
b1 31/66 46.9 Class A (total) 35.11 25.2 32.1
b2 2/10 20 b1 46.9 50 36.4
b18 3/10 30 b2 20 0 30
b29 2/12 16.6 b18 30 30 20
b34 11/22 50 b29 16.6 25 8.3
b40 12/31 38.7 b34 50 36.4 0
b47 10/12 83.7 b40 38.7 6.5 19.4
b55 2/12 16.6 b47 83.7 83.3 58.3
b82 0 0 b55 16.6 25 0
b121 9/14 64.3 b82 0 14.3 0
Class B (total) 82/203 40.39 b121 64.3 7.1 64.3
c1 2/71 2.8 Class B (total) 40.39 32 25.6
c2 36/46 78.2 c1 2.8 14.1 0
c3 2/11 18.1 c2 78.2 23.9 69.6
c23 11/29 37.9 c3 18.1 100 9.1
c26 7/17 41.1 c23 37.9 27.6 24.1
c37 9/46 19.5 c26 41.1 11.8 47.1
c47 1/20 5 c37 19.5 80.4 10.9
c55 1/15 6.6 c47 5 25 0
c55 6.6 13.3 0
c56 0 0
c56 0 10 0
c66 3/20 15
c66 15 20 5
c67 8/15 53.3
c67 53.3 80 46.7
c69 3/17 17.6
c69 17.6 5.9 5.9
c94 9/12 75
c94 75 25 58.3
Class C (total) 92/329 27.9
Class C (total) 27.9 32.5 21
d15 7/22 31.8
d15 31.8 0 9.1
d17 1/10 10
d17 10 0 0
d58 8/51 15.6
d58 15.6 3.9 3.9
d144 3/12 25
d144 25 91.7 16.7
Class D (total) 19/95 20
Class D (total) 20 13.7 6.3
f23 8/12 66.6
f23 66.6 25 41.7
Class F (total) 8/12 66.6
Class F (total) 66.6 25 41.7
g3 29/34 85.2
g3 85.2 44.1 76.5
Class G (total) 29/34 85.2 Class G (total) 85.2 44.1 76.5
Overall 276/804 34.32
Overall 34.32 29.4 24.9

104 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 5, No. 12, 2014

REFERENCES Conference on Computational Advances in Bio and Medical Sciences


[1] A. Sharaf Eldin, T. H.A. Soliman, M. E. Marie, and M. M. Ghareeb, “A (ICCABS), pp. 63-68, 2011.
Deep Glimpse into Protein Fold Recognition,” International Journal of [23] J. Han, J. Pei, B. Mortazavi-Asl, Q. Chen, U. Dayal, and M. Hsu,
Sciences, vol. 2, pp.24-33, 2013. “FreeSpan: frequent pattern-projected sequential pattern mining,”
[2] D. T. Jones, “GenTHREADER: An efficient and reliable protein fold Proceedings of the ACM 2000 SIGKDD International Conference on
recognition method for genomic sequences,” J. Mol. Biol. vol. 287, Knowledge Discovery and Data Mining (KDD ’00), pp. 355–359, 2000.
pp.797-815, 1999. [24] M. Zaki, “SPADE: an efficient algorithm for mining frequent
[3] Lin K, May A and Taylor W (2002) Threading using neural network sequences,” Machine Learning, vol. 42 (1/2), pp.31–60, 2001.
(TUNE): The measure of protein sequence-structure compatibility. [25] J. Pei, J. Han, B. Mortazavi-Asl, J. Wang, H. Pinto, Q. Chen, U. Dayal,
Bioinformatics, vol. 18, 1350-1357. and M. Hsu, “Mining sequential patterns by pattern-growth: the
[4] W. Thomas, C. Igel, and J. Gebert, “Protein fold class prediction using PrefixSpan approach,” IEEE Transactions on Knowledge and Data
neural networks with tailored early-stopping,” IEEE International Joint Engineering, vol. 16 (11), pp.1424–1440, 2004.
Conference on Neural Networks IJCNN, vol. 3, pp.1693 – 1697, 2004. [26] X. Yan, J. Han, and R. Afshar, “CloSpan: mining closed sequential
[5] A. Raval, Z. Ghahramani, and D. Wild, “A Bayesian network model for patterns in large datasets,” Proceedings of the 2003 SIAM International
protein fold and remote homologue recognition,” Bioinformatics, vol. Conference on Data Mining (SDM ’03), pp. 166–177, 2003.
18: pp.788-801, 2002. [27] P. Tzvetkov, X. Yan, and J. Han, “TSP: mining Top-K closed sequential
[6] J. Xu, “Fold recognition by predicted alignment accuracy,” IEEE/ACM patterns,” Knowledge and Information Systems, vol. 7 (4), pp. 438–457,
Trans. Comput. Biol. Bioinform, vol. 2 , pp.157-165, 2005. 2005.
[7] M. Judy and K. Ravichandran, “A solution to protein folding problem [28] J. Wang, J. Han, and C. Li, “Frequent closed sequence mining without
using a genetic algorithm with modified keep best reproduction candidate maintenance,” IEEE Transactions on Knowledge and Data
strategy,” Proceeding of IEEE Congress on Evolution Computation, Engineering, vol. 19 (8), pp.1042–1056, 2007.
Sep. 25-28, Singapore, pp. 4776-4780, 2007. [29] C. Luo and S. Chung, “Efficient mining of maximal sequential patterns
[8] R. Unger, “The genetic algorithm approach to protein structure using multiple samples,” Proceedings of the 2005 SIAM International
prediction,” Struct. Bond., vol. 110 , pp.153-175, 2004. Conference on Data Mining (SDM ’05), pp. 64–72, 2005.
[9] S. Han, B. Lee, S. Yu, C. Jeong, S. Lee, and D. Kim, “Fold recognition [30] J. Pei, J. Han, and W. Wang, “Mining sequential patterns with
by combining profile-profile alignment and support vector machine,” constraints in large databases,” Proceedings of the 2002 ACM
Bioinformatics, vol. 21, pp.2667-2673, 2005. International Conference on Information and Knowledge Management
(CIKM ’02), pp. 18–25, 2002.
[10] W. Chmielnicki and K. Stapor, “Protein fold recognition with combined
SVM-RDA classifier,” Proceedings of the HAIS, Part I, LNAI 2010, [31] X. Ji, J. Bailey, and G. Dong, “Mining minimal distinguishing
vol. 6076, pp.162-169, 2010. subsequence patterns with gap constraints,” Knowledge and Information
Systems, vol. 11 (3), pp.259–296, 2005.
[11] F. Liang and W. Wong, “Evolutionary Monte Carlo for protein folding
simulations,” J. Chemi. Phy., vol.115, pp.3374-3380, 2001. [32] E. Chen, H. Cao, Q. Li, and T. Qian, “Efficient strategies for tough
aggregate constraint-based sequential pattern mining,” Information
[12] N. Alione, “Parallel evolution strategy on grids for the protein threading Sciences; 178 (6), pp.1498–1518, 2008.
problem,” J. Parallel Distributed Computing, vol. 66, pp.489-1502,2006.
[33] H. Kum, J. Pei, W. Wang, and D. Duncan, “ApproxMAP: approximate
[13] C. Carpio, S. Sasaki, L. Baranyi, and H. Okada, “A parallel hybrid GA mining of consensus sequential patterns,” Proceedings of the 2003
for peptide 3D structure prediction,” Proceedings of the Workshop on SIAM International Conference on Data Mining (SDM ’03), pp. 311–
Genome Informatics, Universal Academy Press, Tokyo, 1995. 315, 2003.
[14] R. Islam and A. Ngom, “Parallel evolution strategy for protein [34] C. Kum, J. Chang, and W. Wang, “Sequential pattern mining in multi-
threading,” Proceedings of the 25th International Conference on Chilean databases via multiple alignment,” Data Mining and Knowledge
Computer Science Society, IEEE Computer Society, Washington, DC., Discovery, vol.12, pp.151–180, 2002.
USA, pp. 2347 – 2354, 2005.
[35] J. Yang, P. Yu, W. Wang, and J. Han, “Mining long sequential patterns
[15] D. Nguyen, I. Yoshihara, K. Yamamori, and M. Yasunaga, “Aligning in a noisy environment,” Proceedings of the 2002 ACM SIGMOD
multiple protein sequences by parallel hybrid genetic algorithm,” International Conference on Management of Data (SIGMOD ’02), pp.
Genome Inform, vol. 13, pp.123-132, 2002. 406–417, 2002.
[16] S. Thomas and N. Amato, “Parallel protein folding with STAPL,”
[36] H. Cheng, X. Yan, and J. Han, “IncSpan: incremental mining of
Proceedings of the IEEE 18th International Parallel and Distributed
sequential patterns in large databases,” Proceedings of the 2004 ACM
Processing Symposium, Washington, DC., USA., doi:
SIGKDD International Conference on Knowledge Discovery and Data
10.1109/IPDPS.2004.1303204, 2004.
Mining (KDD ’04), pp. 527–532, 2004.
[17] R. Agrawal and R. Srikant, “Mining sequential patterns,” Proceedings of
[37] M. Lin, S. Hsueh, and C. Chang, “Fast discovery of sequential patterns
the 1995 International Conference on Data Engineering, pp.3–14, 1995.
in large databases using effective time-indexing,” Information Sciences,
[18] A. Brazma, I. Johansen, J. Vilo, E. Ukkonen, “Pattern Discovery and vol.178( 22), pp.4228–4245, 2008.
Biosequences,” Honavar, V G, Slutzki G (eds.) ICGI, LNCS (LNAI) [38] M. Zaki, “Sequence Mining in Categorical Domains: Incorporating
2000, vol. 1433, pp.257-270, Springer, Heidelberg, 2000.
Constraints,” Proceedings of the in 9th Int’l Conference on Information
[19] M. Ester, “A top-down method for mining most specific frequent and Knowledge Management, Washington, DC, 2000.
patterns in biological sequence data,” Proceedings of the 2004 SIAM [39] T. Exarchos, C. Papaloukas, C. Lampros, and D. Fotiadis, “Mining
International Conference on Data Mining (SDM ’04), pp. 90–101, 2004.
sequential patterns for protein fold recognition,” Journal of Biomedical
[20] K. Wang, Y. Xu, and J. Yu, “Scalable sequential pattern mining for Informatics, vol. 41: 165–179, 2008.
biological sequences,” Proceedings of the 2004 ACM International [40] U. Sakakibara, “Grammatical Inference in Bioinformatics,” IEEE
Conference on Information and Knowledge Management (CIKM ’04), Transactions on Pattern Analysis and Machine Intelligence 27(7), pp.
pp. 178–187, 2004. 1051-1062, 2005.
[21] Y. Xiong, J. He, and Y. Zhu, “TOPPER: An algorithm for mining top k [41] P. Peris, D. Lopez, and M. Campos, “Igtm: an algorithm to predict
patterns in biological sequences based on regularity measurement,” transmembrane domains and toplogy in proteins,” BMC-Bioinformatics
Proceedings of IEEE Bioinformatics and Biomedicine Workshops 9, pp.367-378, 2008.
(BIBMW), pp.283-288, 2004.
[42] P. Peris, D. Lopez, M. Campo, and J. Sempere, “Protein motif prediction
[22] L. Chen and W. Liu W, “An algorithm for mining frequent patterns in
by grammatical inference,” In Sakakibara, Y., Y., Kobayashi, S., K.,
biological sequence,” Proceedings of the IEEE 1st International
Nishino, T., Tomita, E. (eds) ICGI 2006. (LNCS (LNAI), vol. 4201,

105 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 5, No. 12, 2014

[44] H. Berman, J. Westbrook, Z. Feng, G. Gilliland, T. Bhat, H. Weissig, I.


pp.175-187. Springer, Heidelberg, 2008. Shindyalov, P. Bourne, “The Protein Data Bank,” Nucleic Acids
[43] T. Yokomori and S. Kobayashi, “Learning Local Languages and Their Research, Vol. 28, pp.235–242, 2000.
Application to DNA Sequence Analysis,” IEEE Transactions Pattern [45] D. Fischer, “Servers for protein structure prediction,” Curr Opin Struct
Analysis Machine Intelligence, 20(10), pp.1067-1079, 1998. Biol, vol. 16(2), pp.178–182, 2006.

106 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 5, No. 12, 2014

Facial Expression Recognition Using 3D


Convolutional Neural Network
Young-Hyen Byeon Keun-Chang Kwak*
Department of Control and Instrumentation Engineering, Department of Control and Instrumentation Engineering,
Chosun University Chosun University
Gwangju, Korea Gwangju, Korea

Abstract—This paper is concerned with video-based facial fitted to situation and for robot to be characterized under
expression recognition frequently used in conjunction with HRI recognizing various emotional response like voice and
(Human-Robot Interaction) that can naturally interact between expression [6]. We especially focus on video-based facial
human and robot. For this purpose, we design a 3D-CNN(3D expression recognition technique.
Convolutional Neural Networks) by augmenting dimensionality
reduction methods such as PCA(Principal Component Analysis) People express their mind through gestures and facial
and TMPCA(Tensor-based Multilinear Principal Component expressions. The facial expression is the most useful, natural
Analysis) to recognize simultaneously the successive frames with mean to notify their mind. Study about facial expression have
facial expression images obtained through video camera. The 3D- been researched long ago by cognitive scientist and recently
CNN can achieve some degree of shift and deformation many researcher try to develop methods to recognize facial
invariance using local receptive fields and spatial subsampling expression automatically and accurately [8][9].
through dimensionality reduction of redundant CNN’s output.
The experimental results on video-based facial expression There are methods to recognize facial expression such as
database reveal that the presented method shows a good comparing positions of eye, nose and mouth[10], optical flow
performance in comparison to the conventional methods such as extracting muscle’s movement[11][12], PCA(Principal
PCA and TMPCA. Component Analysis)[13] and LDA(Linear Discriminant
Analysis)[9][14][15]. Ekman, a psychologist who researches
Keywords—convolutional neural network; facial expression facial expression, said human’s facial expression disappears
recognition; deep learning within seconds [16]. That is, facial expression recognition
needs to work under keeping facial expression both long and
I. INTRODUCTION short [17]. That means we need to study facial expression
HRI (Human-Robot Interaction) is a critical technology as recognition using a stationary image and video having time
evaluating, developing and designing interactional base [18][19]. However, Methods referred above are a little
environments for intelligent system to make cognitive and difficult to use for video-based facial expression recognition,
emotional interaction through some communicational channels including successive frames with facial expression images.
between human and robot. It is for synthetically understanding
On the other hand, Convolutional Neural Networks (CNN)
a user’s intention and then responding [1]-[5].
has been successfully applied to face recognition from two-
Such HRI has basic difference in autonomy for robot and dimensional images [20]. The networks incorporate constraints
conventional HCI (Human-Computer Interface) to have, and achieve some degree of shift and deformation invariance.
interactional bidirection and diversity of interaction or This method has demonstrated to be successful in various fields
controlling level[6][7]. such as character recognition [21], document recognition [22],
object recognition [23], handwritten digit recognition [24],
In a routine living with human for robot to give service EEG signal classification [25], and facial expression
should need ability to interact with human using same means. recognition [26]. However, the conventional CNNs are
Likewise, for efficient interaction between human and robot currently limited to handling video-based images. Furthermore,
should need system of C3 paradigm that consists of modules to because this network is trained with the usual backpropagation
develop convenience, cooperativeness and closeness of gradient descent procedure, it is not appropriate for video-
interaction between human and robot. Also, for efficient based face images with redundant CNN’s output.
interaction between human and robot should need those
technologies of various areas are converged. For example, it Therefore, we design a 3D-CNN (3D Convolutional Neural
needs multimodal interactive method that can provide mediator Networks) by expanding 2D structure of CNN to 3D structure
of various communications like vision, auditory, feel and of CNN for video-based facial expression recognition. There
mediating interface, and multimodal technology that converges has been no study that tried to apply 3D-CNN for video-based
information inputted through various interactive channels. facial recognition. The experiment uses video-based facial
Besides, for performing proper works to requests from user and expression database of CNU captured from video camera. The
situation should need a series of cognitive course like experimental results reveal that the 3d-CNN shows a good
recognition of situation, inference, decision making and plan. performance in comparison with the previous approaches with
Also, it needs technologies to make proper emotional move vector representation such as PCA and TMPCA [17].

107 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 5, No. 12, 2014

Conventional PCA and TMPA are described in section 2. A are linearly combined from an eigenface with a feature vector.
3D-CNN is explained in section 3 and experimental result is in The course of verifying PCA is described below, see table 2.
section 4. We make conclusions in section 5.
II. RELATED WORK
A. Principal Component Analysis
PCA is a 2D statistical method that uses statistical Fig. 2. Face images by linearly combining an egienface with a feature vector
properties to variance and is used to efficiently reduce
dimensionality of input data that have high dimensionality. To TABLE II. VERIFIVATION OF PCA
summarize PCA, it is a method of dimensionality reduction by 1. Definition of p units of vector of verifying image
linearly projecting whole data of image to some eigenvectors
, | | - (5)
that are the biggest variances of whole data of image [27][28].
2. Subtraction of each image vector from averaging image
̅ , ∑ (6)

3. Definition of feature vector using eigenvector about verifying image


̃ ̅ (7)

Finally a face image is classified to minimum similarity by


measuring a similarity between feature vector of image that is
calculated above and feature vectors that exist for training. The
method of measuring a similarity `is shown in detail at section
4 [29].
B. Multilinear Principal Component Analysis
MPCA (Multilinear Principal Component Analysis) is a
method that gets covariance directly without transforming 1D
vector. Tensors as A and components of tensors as N are
normally represented. N is degree of tensor’s target and each
exponent defines a mode. Tensors under N>2 can be shown as
Fig. 1. Projection of PCA a high degree of vector and normalized matrix. Component of
tensor has exponent marked at bracket. Tensor having N degree
Though the data’s dimensionality is reduced to one is . It is expressed by N exponents, ,
dimensionality using PCA the data still can be separable to and each allocates A’s n-mode. Tensor A’s n-
each class as shown in the Fig. 1. That is, PCA have merits that mode multiplication by a matrix at is
it seems to have effect of keeping information about shown as following.
distribution of input data, reducing computing power, reducing
( )( ) (8)
noise of data, and compressing data. The course of training
PCA is described below, see table 1.
∑ ( ) ( )
TABLE I. TRAINING OF PCA
1. Define of p units of vector of training image Scalar multiplication of two tensors, , is
| |
defined as following.
, - (1)
(9)
2. Subtraction of each image vector from averaging image ∑∑ ∑ ( ) ( )
̅ , ∑ (2)

3. covariance matrix using p units of ̅ vector Tensor A’s frobenius norm is defined as || ||
̅̅ , ̅ ,̅̅̅|̅̅̅| ̅- (3) √ and under standard multilinear algebraic expression,
tensor A is expressed as following form of multiplication.
4. Definition of eigenvalue and eigenvector about covariance matrix
( ) ( ) ( ) (10)
(4)
( )
5. Definition of feature vector about training image Where, ( ) is an orthogonal matrix of
. In space, when M tensor’s set is
Here, an eigenvalue from covariance matrix means * +, total variance is the following equations.
direction to make variance the biggest and an eigenvector
(11)
matched to it means variability to certain direction. This ̅ ||
eigenvector is the eigenface. Fig. 2 shows face images which ∑ ||

108 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 5, No. 12, 2014

(12) ( ) ( ) ( ) ( ) ( ) ( ) ( ) (17)
̅ ∑ ( ) ( ) ( ) ( )
( ) ( ) ( ) ( )
Here, ̅ is an averaged tensor and total variance matrix of ( ) ( ) ( ) ( )
n-mode is the following equation. ( ) ( ) ( ) ( )

(13) ( ) ( ) ( ) ( )
( ) ̅( ) )( ( ) ̅( ) )
∑( ( ) ( ) ( ) ( )
( ) ( ) ( ) ( )
( ) ( ) ( ) ( ) ( )
Where, is ’s n-mode unfolded matrix. MPCA
( ) ( ) ( ) ( )
maximizes a scatter criterion based on tensor and this problem
can be solved by separating it into N linear optimizations. ( ) ( ) ( ) ( )

When there are all different projection matrix, ( ) ( ) ( ) ( )


̃( ) ̃( ) ̃( ) ̃( ) , a consists of eigenvectors ( ) ( ) ( ) ( )
corresponding the largest eigenvalue in matrix of { ̃ ( )
} is maximized as the following equation. Eigenvector is
calculated using following equation.
{ ̃( )
} (14)
̅( ) ̅( ) ̅( )

(15)
( )
∑( ( )
̅( ) ) ̃ ( )
̃ ( )

( ( )
̅( ) )

Where, ̃ ( ) ( ̃( ) ̃( ) ̃( ) ̃( )

̃( ) ̃( ) . Fig. 3. Example of a 3D convolution


Gotten projection matrix ̃( ) is seen as ∏ . But,
A subsampling value of a pixel is calculated by multiplying
because all of these are not useful for recognition some of these
each pixel of kernel and image and accumulating them in an
can be selected through discernment of class. Ordered feature
overlapped area. Here values of kernel are for calculating a
vector by discernment of class is calculated as following [17].
mean. This process is conducted on whole image and the
̃( ) ̃( ) ̃( ) (16) resulting image becomes small by abandoning some
intersecting pixels. A subsampling achieves some degree of
C. Tensor-based Multilinear Principal Component Analysis shift and deformation invariance. A subsampling is expressed
by the following Fig. 4.
Common facial expression recognition is to recognize facial
expression every moment facial expression image comes. It is
difficult to hold a same facial expression for general person.
Such method is not appropriate in environment of recognizing
facial expression on real-time. TMPCA (Tensor-based
Multilinear Principal Component Analysis) is a method that
considers several frames as a facial expression. For example,
assume that 10 frames of which each image’s size is
comes from camera per second. In a case of TMPCA,
covariance matrix is generated for each frame. The total
projection vector’s size is . 3D tensor is constructed
by adding time axis on grayscale image commonly used in Fig. 4. Example of a subsampling
facial expression recognition and facial recognition is
performed using MPCA. TMPAC is used to extract image’s Fig. 5 shows a structure of 3D-CNN for facial expression
essential features by expressing 3D tensor axis of frame is recognition based on video. Here, the structure consists of 5
added to direct tensor. It improves processing time and layers. First layer is for input, second layer is for convolution,
performance of recognition [17]. third layer is for subsampling, forth layer is for convolution and
fifth layer is for subsampling. Initial values of kernels are
III. 3D CONVOLUTIONAL NEURAL NETWORK random in specific range. In a first layer, a 3D data which
A 3D convolution is performed with 3D kernel and 3D data consists of five frames on video enters as input of system. In a
that 2D images are merged. A 3D convolution is expressed by second layer having 3 maps, a convolution extracts features
the following equation (17) and Fig. 3. from the input.

109 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 5, No. 12, 2014

In a third layer, a subsampling reduces the size of input and fear were captured from 10 people. To analyze
image. In a forth layer having 29 maps, a convolution extracts performance, we used a computer that had CPU 3.10GHz,
features from output of previous layer. In a fifth layer, a Intel(R) Core(TM) i5 4440, memory 8GB and a software we
subsampling reduces the size of image from previous layer. use is Matlab R2013A. This experiment is performed with
Finally, a feature vector is created by making images arranged comparing PCA and TMPCA, conventional methods. Fig. 6
to single row on all maps.
Data used for input is facial expression images based on
video which are overlapped successively along five frames.
The size of data starts with 64x48x5 as input. At the second
layer, the size of data turns 60x44x3 because a convolution is
performed with a kernel of which size is 5x5x3. At the third
layer, the size of data turns 30x22x3 because a subsampling is
performed with a kernel of which size is 2x2x1 without time
base. At the forth layer, the size of data turns 26x18x1 because
a convolution is performed with a kernel of which size is 5x5x3.
At the fifth layer, the size of data turns 13x9x1 because a
subsampling is performed with a kernel of which size is 2x2x1
without time base. This 13x9x1 can be a vector by making it
have single row. That is, a map can have 117 feature values. So
the last size of feature vector is 3393 because there are 29 maps
[20][30].
shows the images of facial expressions from CNU.
Fig. 6. Facial expression images captured by video camera

In the database of CNU for facial expressions, there are


facial expressions of ten people. Every image of facial
expressions is resized to 64x44. Such single images are
grouped every five frames because a facial expression consists
of 5 frames in the data of CNU for facial expressions. So the
size is 64x44x5 also as an input. Fig. 7 shows an example of
video data that five frames are linked successively. The
structure of 3D-CNN we use is summarized at table 3[17][31].

Fig. 5. A struct of 3D-CNN used in this study

A method for classification used in this study is Euclidean


distance between vectors previously generated for learning and
a vector generated for recognition. The close distance is Fig. 7. Example of video data that five frames are linked successively
accepted for a result of recognition. Euclidean distance is
expressed by the following equation (18)[29]. Feature vectors are obtained about all training data and all
checking data. A label has the least value among distances of
(18) feature vectors between training and checking is considered as
( ) ∑| | its class. Finally, the recognition rate is obtained by dividing
the successful times by total times. For experiment more, we
make changes on the number of maps at second and forth layer.
IV. EXPERIMENTAL RESULTS The number of map at the second layer is varied from 1 to 30
To evaluate the performance of 3D-CNN, a database of and the number of map at fourth layer is varied from 1 to 30.
CNU for facial expression is used. This database was captured The result of experiment is summarized at table 4. The best
from video camera. There are training and checking data which recognition rate and its number of map are compared. Fig. 8
consist of 15 frames every facial expression. Six facial shows the images that are generated during 3D-CNN about first
expressions like happiness, sadness, anger, surprise, disgust, data. Those images were performed by histogram equalization.

110 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 5, No. 12, 2014

TABLE III. STRUCTURE OF 3D-CNN V. EXPERIMENTAL RESULTS


Layer type kernel size We have designed 3D Convolutional Neural Networks for
video-based facial expression images to perform human-robot
1 Input interaction technique. Thus, the 3D-CNN has characteristics
2 Convolution 5x5x5
that can recognize simultaneously the successive frames with
facial expressional images obtained through video camera. The
3 Subsampling 2x2 experimental results reveal that the 3D-CNN shows different
performance in varying the number of map and a good
4 Convolution 5x5x6 performance in comparison with the conventional methods
5 Subsampling 2x2 such as PCA and TMPCA. Consequently, we hope that the
presented method can have better performance. For more
experiment, we will try to combine the 3D-CNN with other
TABLE IV. RESULT OF EXPERIMENT method that has possibility to enhance its performance.
Number of map REFERENCES
Recognition rate [1] A. S. Sekmen, M. Wilkes, K. Kawamura, "An application of passive
human-robot interaction: human tracking based on attention distraction",
Second layer Fourth layer
IEEE Trans. on Systems, Man, and Cybernetics-Part A, vol. 32, no. 2,
pp. 248-259, 2002.
Checking 3 29 95%
[2] X. Yin, M. Xie, "Finger identification and hand posture recognition for
human-robot interaction", Image and Vision Computing, vol. 25,
pp.1291-1300, 2007.
[3] Y. Sugimoto, Y. Yoshitomi, S. Tomita, "A method for detecting
transitions of emotional states using a thermal facial image based on a
synthesis of facial expressions", Robotics and Autonomous Systems,
vol. 31, pp. 147-160. 2000.
[4] J. J. Lien, T. Kanade, J. F. Cohn, C. C. Li, "Detection, tracking, and
classification of action units in facial expression", Robotics and
Autonomous Systems, vol. 31, pp. 131-146. 2000.
[5] G. Medioni, A. R. J. Francois, M. Siddiqui, K. Kim, H. Yoon, "Robust
real-time vision for a personal service robot", Computer Vision and
Image Understanding, vol. 108, pp. 196-203, 2007.
[6] Korea Intellectual Property Office, Trend of Patent Application for
Human-Robot Interaction, 2005.
[7] Institute for Information Technology Advancement, Human-Robot
Interaction Technology (Intelligent Service Robot), 2006
[8] P. Ekman and W. V. Friesen, “Emotion in the human face system,”
Cambridge University Press, San Francisco, CA, scond edition, 1982.
[9] J. H. Rho, Y. H. Baek, S. R. Moon, Y. J. Kang, “A Study on Face
Expression Recognition using LDA Mixture Model,” Proceeding of the
Spring Conference of the Korea Multimedia Society, pp. 50-54, 2006.
[10] Z. Zang, M. Lyons, M. Schuster and S. Akamatsu, “Comparison
between Geometry-Based and Gabor Wavelets-Based Facial Expression
Recognition Using Multi-Layer Perceptron,” Proceedings of Third IEEE
International Conference on AutomaticFace and Gesture Recognition, pp.
454-459, 1998.
Fig. 8. Images generated during 3D-CNN
[11] J. J. Lien, T. Kanade, J. Cohn, and C. Li, “Detection, Tracking, and
Classification of Action Units in Facial Expression,” Journal of Robotics
To compare 3D-CNN proposed in this paper with PCA and and Autonomous Systems, July 1999.
TMPCA in facial expression recognition, each recognition rate [12] K. Mase, “Recognition of facial expression from optical flow,” IEICE
is shown in table 5. The performance of 3D-CNN is higher Transactions on Information Systems, vol. J80-D, no. 6, June 1997.
than performance of PCA as 6.7% up and performance of [13] M. Turk, A. Pentland, “Eigenfaces for recognition,” Journal of Cognitive
TMPCA as 4.44% up. Neuroscience, vol. 3, no. 1, pp. 71-86, 1991.
[14] P. Belhumeur, J. Hespanha, D. Kriegman, “Eigenfaces vs. fisherfaces:
TABLE V. COMPARISON OF 3D-CNN WITH PCA AND TMPCA Recognition using class specific linear projection,” IEEE Trans. On
Pattern Analysis and Machine Intelligence, vol. 19, no. 7, pp. 711-720,
Recognition Rate 1997.
[15] S. Balakrishnama, A.Ganapathiraju, “Linear Discriminant Analysis – A
PCA[13] 88.3% Brief Tutorial,” Institute for signal and Information Processig, 1998.
[16] P. Ekman and W. V. Friesen, “Unmasking the face,” Malor Books Press,
TMPCA[17] 90.56% 2003.
3D-CNN 95% [17] K. C. Kwak, M. W. Lee, S. B. Pan, “Facial Expression Recognition by
Tensor Representation from Video,” The Journal of Korean Institute of
Information Technology, vol. 10, no. 4, pp. 185-190, April 2012.

111 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 5, No. 12, 2014

[18] Y. H. Joo, K. H. Jeong, M. H. Kim, J. B. Park, J. Lee, Y. J. Cho, “Facial [28] Y. H. Shin, J. S. Ju, E. Y. Kim, T. Kurata, A. K. Jain, S. Park, K. Jung,
Image Analysis Algorithm for Emotion Recognition,” Journal of Korean “Automatic Facial Expression Recognition using Tree Structures for
Institute of Intelligent Systems, vol. 14, no. 7, pp. 801-806, December Human Computer Interaction,” Korea Society of Industrial Information
2004. Systems, vol. 12, no. 3, pp. 60-68, 2007.
[19] J. T. Joo, G. J. Park, G. E. Gho, H. C. Yang, G. B. Sim, “Emotion [29] M. Y. Lee, “Facial Expression Recognition by Tensor Representation
Recognition and Expression using Facial Expression,” Proceedings of from Video”, Chosun University, 2012.
KFIS Spring Conference, vol. 17, no. 1, pp. 295-298, 2007. [30] Shuiwang Ji, Wei Xu, Ming Yang, and Kai Yu, “3D Convolutional
[20] S. Lawrence, C. L. Giles, A. C. Tsoi, and A. D. Back, “Face Neural Networks for Human Action Recognition”, IEEE Trans. on
Recognition: A Convolutional Neural-Network Approach”, IEEE Trans. Pattern Analysis and Machine Intelligence, Vol. 35, No. 1, pp. 221-231,
on Neural Networks, Vol. 8, No. 1, pp. 98-113, 1997. 2013.
[21] G. Lv, “Recognition of multi-fontstyle characters based on [31] S. J. Han, K. C. Kwak, H. J. Go, S. S. Kim, M. G. Chun, “Facial
Convolutional neural network”, International Symposium on Expression Recognition using ICA-Factorial Representation Method,”
Computation Intelligence and Design, Vol. 2, pp. 223-225, 2011. Journal of Korean Institute of Intelligent Systems, vol 13. no. 3, pp. 371-
[22] Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-Based 376, june 2003.
Learning Applied to Document Recognition”, Proceeding of IEEE , Vol.
86, No. 11, pp.2278-2324, 1998. AUTHORS PROFILE
[23] Y. Lecun, F. J. Huang, L. Bottou, “Learning methods for generic object Young-Hyen Byeon received the B.Sc and M.Sc. from Chosun University,
recognition with invariance to pose and lighting”, Proceeding of the Gwangju, Korea, in 2012 and 2014, respectively. He is currently pursuing a
Computer Vision and Pattern Recognition, vol. 2, pp. 97-104, 2004. Ph.D. candidate. His research interests include human–robot interaction,
computational intelligence, and pattern recognition.
[24] X. X. Niu, and C. Y. Suen, “A novel hybrid CNN-SVM classifier for
recognizing handwritten digits”, Pattern Recognition, Vol. 45, No. 4, pp. Keun-Chang Kwak received the B.Sc., M.Sc., and Ph.D. degrees from
1318-1325, 2012. Chungbuk National University, Cheongju, Korea, in 1996, 1998, and 2002,
[25] H. Cecotti, and A. Greaser, “Convolutional Neural Network with respectively. During 2003–2005, he was a Postdoctoral Fellow with the
embedded Fourier Transform for EEG classification”, International Department of Electrical and Computer Engineering, University of Alberta,
Conference on Pattern Recognition, pp. 1-4, 2008. Edmonton, AB, Canada. From 2005 to 2007, he was a Senior Researcher with
the Human–Robot Interaction Team, Intelligent Robot Division, Electronics
[26] M. Matsugu, K. Mori, Y. Mitari, and Y. Kaneda, “Subject independent and Telecommunications Research Institute, Daejeon, Korea. He is currently
facial expression recognition with robust face detection using a the Associative Professor with the Department of Control and Instrumentation,
convolutional neural network”, Neural Networks, Vol. 16, No. 5-6, pp. Engineering, Chosun University, Gwangju, Korea. His research interests
555-559, 2003. include human–robot interaction, computational intelligence, biometrics, and
[27] P. N. Belhumeur, J. P. Hespanha, and D. J. Kriegman, “Eigenfaces vs. pattern recognition. Dr. Kwak is a member of IEEE, IEICE, KFIS, KRS,
Fisherfaces: recognition using class specific linear projection,” IEEE ICROS, KIPS, and IEEK.
Trans. on Pattern Analysis and Machine Intelligence, Vol. 19, No. 7, pp.
711-720, 1997.

112 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 5, No. 12, 2014

Social Media in Azorean Organizations: Policies,


Strategies and Perceptions
Nuno Filipe Cordeiro Teresa Tiago
Business and Economics Department Business and Economics Department
University of the Azores University of the Azores
Ponta Delgada, Portugal Ponta Delgada, Portugal

Flávio Tiago Francisco Amaral


Business and Economics Department Business and Economics Department
University of the Azores University of the Azores
Ponta Delgada, Portugal Ponta Delgada, Portugal

Abstract—Social media have brought new opportunities, and These sites have been modifying the habits and behaviors
also new challenges, for organizations. With them came the rise of better-informed consumers. One property that contributed to
of a new context of action, largely influenced by the changing success of social media was the possibility of two-way
habits and the behavior of the consumer. The purpose of the communication, allowing user content to be generated daily on
following research is to analyze the views and strategies these sites, without organizational control. This permitted
embraced by Azorean organizations, as well as the perceptions clients to express their points of view about products and
arising from the use of social media. services, share experiences and provide recommendations.
These content has influence on other consumers’ purchasing
For this study, a quantitative type of research, of a descriptive
decisions [2].
nature, was chosen, using an online survey. A total of 232 valid
surveys obtained led to a range of perceptions about the use of These facts piqued the interest of firms, motivating them to
social media. The study hypotheses were verified using the integrate social media into their business process [3], utilize the
Kruskal-Wallis analysis. chance to be closer to their clients, identify their behaviors and
satisfy them [2]. The literature review reveals that many
The results demonstrate that the majority of organizations studies focus primarily on the consumer perspective, and some
involved in the study already use social media and that almost all
authors and research organizations suggest that the enterprise
of them use Facebook. The main reasons are to reach a wider
perspective is often overlooked by social media research.
audience, to increase notoriety and to communicate with
customers. The most relevant difficulty felt after joining the This work is about a firm’s exploitations of social media to
social media is the lack of resources and availability. Marketing reach clients, especially in identifying which channels and
initiatives and content creation are the most-used activities. activities they use, their motivations to use social media,
Remarkably, more than half don’t have a defined strategy, nor whether they use it, which metrics are used to evaluate their
use measuring instruments to assess their presence. However, presence on social media and finally, the benefits of using
they consider that social media enhance their performance. social media. Additionally, it tries to unveil a firm’s strategies
regarding social media, the investment policies and the best
Social media is a widely studied topic from the consumer’s
point of view, but there is still little investigation from an
practices used to achieve the defined objectives.
organizational perspective. This work sought to contribute to the For practical reasons, this study focuses on Azorean firms’
knowledge about the use and involvement of organizations in utilizations of social media. Data gathering was conducted over
social media, especially in the peripheral context. e-mail with the survey sent to Azorean enterprises during
February 2013.
Keywords—Internet; Web 2.0; Social Media; User Generated
Content The rest of this paper is organized as follows: Section 2
discusses the related literature reviewed for this research study.
I. INTRODUCTION The subsequent sub-section outline the conceptual model and
Today, access to the Internet is greatly facilitated and experimental hypothesis on which the model is based and
related technologies are in a continuous evolution. Social presents methodology and discussion of the empirical findings.
media constitute one of these evolutions, and are very popular Section 4 describes the paper's results. The implications for
among users. Sites like Facebook, YouTube, Twitter and others organizations as well as for research and limitations and scope
have attracted users around the world, who spend a for future research are discussed in sections 5 and 6
considerable part of their time on them [1]. respectively.

113 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 5, No. 12, 2014

II. RESEARCH BACKGROUND 3) Critics: Publish ratings or reviews of


On the Internet, a wide range of applications are found that products/services, commenting on blogs and forums and
meet Web 2.0 specifications, varying in degree of interactivity, editing wikis. They are built on what has been published and
context, structure and objectives. They are divided into six commented.
different classes: (i) collaborative projects like Wikipedia; (ii) 4) Collectors: They use RSS feeds, vote on websites and
blogs and micro-blogs as Twitter; (iii) content community sites add bookmarks on websites and photos. The main purpose is
similar to YouTube; (iv) massive multi-player online role- tantamount to storing content.
playing games such as World Warcraft; (v) social virtual 5) Participants: Maintain a profile on a social network
worlds like Second Life; and (vi) social networks, e.g.,
and visit social networking sites. They are designed to relate
Facebook and LinkedIn [4].
to other people.
Web 2.0 can be characterized as: (i) change to online 6) Spectators: They visualize what other produce on
applications in the form of services, the traditional supplier that blogs, videos, podcast and forums.
controls the software passes to a software service without 7) Inactive: Do not practice any of the above activities.
proprietary systems and emphasizes applications with user- The process of acquisition of goods and services can be
friendly features; (ii) through users’ collaboration based on outlined by the Social Feedback Cycle model that consists of
their experience with applications, a continuous and four steps: (i) awareness; (ii) phase of consideration (one looks
incremental application development is made with the at all the attributes that influence the purchase); (iii) purchase
objective of added value to clients. The more users participate stage; (iv) post-purchase considering the social aspects of the
in this process, the more advanced applications will be; and Web and the impact of word of mouth (divulge your
(iii) the emergence of a new set of business models supported experience with the product / service purchased) that can
on consumers’ market niches that are traditionally hard to influence the process of attracting future customers [14].
reach [5].
Utilization of social media changed the way organizations
With these social and technological developments, the do business. They began to communicate with consumers,
Internet is no longer a simple repository of static and passive organizations and suppliers, thereby enabling the creation and
information; it becomes active and participatory. At the same management of customer relationships as well as increasing
time, the users’ behavior changed from that of an information perception of consumer needs. The proper management of
consumer to an active producer of content of information, i.e., these relationships helps eliminate damage to reputation and
User Generated Content (UGC) [6]. loss of revenues due to unhappy customers. Indeed, companies
The term Web 2.0 and social media are used as synonyms, can take advantage of those clients that are satisfied by sharing
largely due to their proximity and interdependence, yet they are their enthusiasm and experience in social media [15].
conceptually different. Social media comprise both the Many organizations utilized the features provided by Web
conduits and the content published by users interactively and 2.0 tools to improve internal communication and leverage the
organizations on sites that meet the Web 2.0 specifications [7]. collective intelligence of the organization. Given the features
So social media is characterized by participation of users, of blogs, wikis, RSS feeds, etc., they are the most-used Web
creating digital communities and the connectivity between its 2.0 tools to facilitate communication, collaboration and sharing
elements [8]. of internal information [16], [17].
The widespread acceptance of social media by consumers An organization, to have a presence on social media,
has enabled them to become involved in content consumption should initially define an implementation strategy. This
and sharing in experiences, opinions and comments [7]. This strategy can pass through the POST method (People Objective
deep consumer behavioral transformation enabled them to feel Strategy Technology) [12]:
accomplished by influencing the opinions of their peers and
contributing to the collective intelligence [9]. This change was 1) Define the target audience and understand their
visible in such way that TIME magazine elected as 2006 activities
Person of the Year 2006, not a particular personality, but 2) Set goals in social media by listening to the clients,
"YOU" — thereby acknowledging the millions of customers promoting the brand, energizing consumers and identifying
who produce information on social media [10], [11]. how they can influence and support potential customers using
Studies of consumers who search for information on wikis and forums, and finally engage customers in developing
products and services in social media suggest that these clients new products / services
have one or more features of Social Technographic profiles 3) Define a strategy for attaining those objectives and
when surfing profiles on social sites [12], [13]: plan the relationships with the clients on social media
1) Creators: Highest level. At least once a month, they 4) Define the social media types where they will be.
publish blogs and Web pages, upload content created by them The presence of organizations on social media enables
them to improve credibility with consumers, strengthen ties of
in video and audio format, and write and publish articles or
loyalty in the emotional and behavioral dimensions, and
stories. At the bottom are the ones who produce content. achieve greater visibility and a much wider audience at lower
2) Talkers: Post on Twitter and update their status on costs. Exploring the UCG, companies have the ability to better
social networks at least once a week. understand consumers, gather information about the strengths

114 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 5, No. 12, 2014

and weaknesses of their products / services and take advantage type, size and sales volume, as predictors of the decision
of "word of mouth." Despite the advantages that social media process. The first subdivision of H2 is: The use of social media
bring to the organizations, it may also involve risks and threats is influenced by demographic characteristics of the
to businesses because they no longer control what customers organizations. Table II lists constructors of this hypothesis.
say about the product / service, commercial treatment and long
term. TABLE II. DEMOGRAPHICS DIMENSIONS OF H2A

III. FRAMEWORK AND HYPOTHESIS Dimensions Hypotheses


The use of social media is equal, on average, for
In the current global economic environment, the ability to H2a1: Island
the different islands.
reach customers, interactivity and low cost associated with
social media types make them even more attractive. However, The use of social media is equal, on average, for
H2a2: Activity Sector
the different sectors of activity.
not all organizations are active in this field and / or explore all
possibilities. H2a3: Enterprise The use of social media is equal, on average, for
Dimension the different organizational dimensions.
In the last decade, the academic community has paid great
attention to users of social media, while the business H2a4: Type of Client
The use of social media is equal, on average, for
component has been less considered. Therefore, it is pertinent the different types of customers.
to deepen the study of the determinants of the decision makers The use of social media is equal, on average, for
H2a5: Offer Type
regarding the use of social media and the process that leads the different offers types.
them to adopt more active or less active presences on different Regarding the decision process concerning the involvement
social media sites. The model under study examines how in social media, it was understood to be relevant to the above
organizational characteristics and perceptions influence the process with the demographic characteristics of the
decision process and barriers to the use of social media in organizations and was used in studies of [3], [20], as well as
businesses. In this sense, the following hypotheses were policies and strategies implemented, which were referenced in
constructed for testing in this study. several studies [3], [21], [22], [23], [24], [20]. Thus, the
The first hypothesis is related to the involvement of second sub -hypothesis H2b is: Engaging in social media is
organizations towards activities in social media that several influenced by demographic characteristics of organizations as
authors [12], [18]. Hypothesis H1: The decision process about showed in Table III.
the involvement is influenced by perceptions that organizations
TABLE III. DEMOGRAPHICS DIMENSIONS OF H2B
have about social media. This was subdivided as shown in
Table I. It is intended to test whether the involvement is Dimensions Hypotheses
influenced by perceptions that organizations have.
Engaging in social media is equal, on average, for
H2b1: Island
the different islands.
TABLE I. DIMENSIONS OF SUB HYPOTHESES OF H1
Engaging in social media is equal, on average, for
H2b2: Activity Sector
Dimensions Hypotheses the different sectors of activity.

Involvement degree in social media is equal, on H2b3: Enterprise Engaging in social media is equal, on average, for
H1a: Motivation average, for the different categories of motivation to Dimension the different organizational dimensions.
use these media.
Engaging in social media is equal, on average, for
H2b4: Type of Client
Involvement degree in social media is equal, on the different types of customers.
H1b: Benefits average, for the different categories of benefits Engaging in social media is equal, on average, for
H2b5: Offer Type
perceived by organizations. the different offer types.

Involvement degree in social media is equal, on The third sub-hypothesis H2c is defined as: The
H1c: Constraints average, for the separate categories of difficulties involvement in social media is a function of policies and
perceived by organizations. strategies implemented as seen in Table IV.

The decision process is divided into the usual decision Some researchers studied perceived barriers to the use of
process of the engagement process, so that the hypothesis H2: social media in SMEs [3], where the type of customer was
The decision making process is influenced by organizational B2B. Thus, we sought to test if the barriers that prevent the use
characteristics" was subdivided into three sub-hypotheses. The of social media are influenced by demographic characteristics
decision process relating to the use of social media was studied of B2B organizations. We defined H3: The barriers that
by [19] in order to find differences in the use of these media in constrain the use of social media are influenced by
SMEs, where the type of customer was B2B. These authors demographic characteristics of organizations as summarized in
used demographic characteristics of organizations, e.g., offer Table V.

115 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 5, No. 12, 2014

TABLE IV. STRATEGIC AND CORPORATE POLICIES DIMENSIONS OF H2C organizations in various sectors of activity and dimensions,
which may or may not use social media. The type of sampling
Dimensions Hypotheses
used was non-probability and, within this, snowball sampling.
H2c1: Channels Used Engaging in social media is equal, on average, Synthetic indices were created by a set of aggregate
for the various channels used. indicators that "measure" the constructors in the study and
Engaging in social media is equal, on average, showed good feasibility. In a second phase, in an inferential
H2c2: Strategy
for different categories of strategy existence. analysis, we used a nonparametric Kruskal-Wallis test in order
Engaging in social media is equal, on average, to validate the proposed conceptual model. The significance
H2c3: Use of for different categories of existence of level used for inferential statistics was 5%.
Measurement Tools measurement to assess their presence in the
media. The period of data collection took place between January
Engaging in social media is equal, on average,
30, 2013 and March 5, 2013. Solicitations were made to 1515
H2c4: Human Resources Azorean organizations and 259 questionnaires were obtained,
for the different kind of human resources
Affections resulting in a response rate of 17.09%. However, 27 were
affects.
Engaging in social media is equal, on average, considered invalid because they are incomplete, so the
H2c5: Time for the different categories of time used on these effective sample size was 232.
media.
Engaging in social media is equal, on average, Of the 232 organizations analyzed, 74% are now using
H2c6: Frequency for the different frequency categories of social social media. Of the remaining, 7.8% are planning to do so in
media usage the next 12 months and 18.1% did not intend to have a
H2c7: Activities
Engaging in social media is equal, on average, presence on social media. Regarding firms’ size, based on the
for the different categories of activities number of employees, 97.8% are SMEs, and the remaining
Undertaken
performed in these media.
Engaging in social media is equal, on average,
2.2% (5 organizations) are large organizations. Also, 60.5%
H2c8: Investment Policy for the different budget categories used in these have 1-9 workers, 31.4% have 10-49 workers, 7% have
media. between 50 and 249 employees and 1.2% have more than 250
staff. With regard to the locations of headquarters, most come
TABLE V. DEMOGRAPHICS DIMENSIONS OF H3 from the island of São Miguel (67.24%). If we add the island
of Terceira, we have 78.02% of the organizations. Distribution
Dimensions Hypotheses of islands are as follows: Faial has 6.47% of the organizations,
The barriers preventing the use of social media are Pico and São Jorge have 4.31%, And Santa Maria, Graciosa,
H3a: Island
alike, on average, for the different islands. Flores and Corvo, taken together, amount to about 7%.
The barriers preventing the use of social media are
H3b: Activity Sector
equal, on average, for different sectors of activity. Businesses with a presence on social media belong to the
The barriers preventing the use of social media are following sectors of activity, tourism, services and trade;
H3c: Dimension equal, on average, for the different organizational together these represent 78.5% of the organizations present.
dimensions. The sectors of manufacturing, media, telecommunications and
The barriers preventing the use of social media are information technology and other sectors represent values on
H3d: Type of Client equal, on average, for the different kind of
customers.
the order of 7%.
The barriers preventing the use of social media are The tourism sector, with 36%, was the one that had the
H3e: Offer Type
equal, on average, for the different types of offering.
largest number of organizations present on social media; within
The hypotheses described above reflect the study’s aims to this, most organizations are in hospitality (19.8%), followed by
explore social media use by Azorean firms. travel agencies, tour operators and animation tourism with
8.7% and restore / bar with 7.6%.
IV. METHODOLOGY AND RESULTS
Regarding the definition of a strategy for social media
After the extensive literature review and defining of the presence, more than three-quarters of Azorean organizations
research methodology based on dimensions and variables did not define a strategy. Only 18.6% of organizations have a
described above, we applied a methodology consisting of four defined strategy, 56.4% have a strategic direction and 25%
phases: 1- Sample definition; 2- Questionnaire developments; have neither.
3- Data collection; 4- Statistical analysis.
Synthetic indicators for the following dimensions were
Based on the literature review, a questionnaire underwent a created:
pre-test, which led to the reorientation of some questions. This
study was conducted on companies in the Autonomous Region  Digital media used - Index ranging from 1- used social
of the Azores, which is part of Portugal. There are two major medium to 2- social media not utilized. The reliability
reasons to analyze a region such as the Azores. The first is the through Alfa Cronbach value is 0.598, which can be
fact that this region has a utilization rate of above average considered to be satisfactory [32]. Facebook has an
regarding the Internet in Portugal [25]. The second stems from average utilization of 1.05, followed by LinkedIn with
the fact that the Azores are nine islands located in the middle of an average of 1.81. Sites with average use are the
the North Atlantic, thus having a pronounced geographical podcast sites Wikis and Pinterest, with an average of
isolation. So the sample gathered is composed of Azorean 1.99

116 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 5, No. 12, 2014

 Motivations for using Social Media - Index ranging influenced by perceptions that organizations have, except for
from 1- not important to 5- very important. The the perceived benefits.
reliability through Alfa Cronbach value is 0.834, which
can be thought to have good reliability [32]. The main TABLE VI. RESULTS OF KRUSKAL-WALLIS TEST FOR H1
motivation of firms using social media presence was
Hypotheses P Conclusion
"Reaching a wider audience" and "Increase sales" both
with an average of 4.66, followed by "Create brand H1a 0,065 Not rejected
awareness or product" with 4.51. "Recruit H1b 0,004 Rejected
collaborators" with 2.57 and "Communicating with
suppliers and partners" with 3.29 were average to poor. H1c 0,354 Not rejected

 Benefits of the use of social media - Index ranging from As can be seen in Table VII for H2a, the use of social
1- not relevant to 5- very relevant. The reliability media is influenced by demographic characteristics of
through Alfa Cronbach value is 0.907, which can be organizations; however, the headquarters’ location and main
thought to have good reliability [32]. The main benefits type of supply do not affect the use. For the latter hypothesis, if
of the companies with a presence on social media were we had used a significance level of 10%, the result would be
"Increasing brand or product awareness" with an different.
average of 4.56, followed by "Attract new customers",
with 4.55 "Increase sales", with a 4.42 average. The TABLE VII. RESULTS OF KRUSKAL-WALLIS TEST FOR H2A
benefits with the lowest averages were "Help in
Hypotheses P Conclusion
recruiting" of 2.88 and "Analyze competition" with
3.53. H2a1 0,256 Not rejected

 Difficulties in exploring social media - Index ranging H2a2 0,000 Rejected


from 1- not relevant to 5- very relevant. The reliability Rejected
H2a3 0,004
through Alfa Cronbach value is 0.909, which can be
thought to have good reliability [32]. The main H2a4 0,004 Rejected
difficulties that companies encounter in the use of social H2a5 0,053 Not rejected
media were "Measuring the impact on your business"
with an average of 3.33, followed by "Availability or As can be seen in Table VIII for H2b, the involvement of
lack of resources", with 3.27 and "Integration with other social media isn’t influenced by demographic characteristics of
marketing initiatives" with a 3.26 average. "Lack of the organizations. However, the main type of supply influences
support from top management of the organization" had the involvement. Are summarized in Table IX, the hypothesis
an average of 2.51 and "Reluctance of managers to tests necessary for H2c: The involvement of social media is
share information" was 2.63. influenced by policies and strategies implemented. The time of
 Barriers to adoption of social media - Index ranging use on social media does not influence involvement.
from 1- not relevant to 5- very relevant. The reliability
TABLE VIII. RESULTS OF KRUSKAL-WALLIS TEST FOR H2B
through Alfa Cronbach value is 0.804, which can be
thought to have good reliability [32]. The main barriers Hypotheses P Conclusion
to the use of social media in business were "Loss of
control of the message" with an average of 3.43, H2b1 0,093 Not rejected
followed by "Lack of adaptation to company culture", H2b2 0,062 Not rejected
with 3.40 and "Difficult to measure and monitor the
benefits", with a 3.35 average. Less significant barriers H2b3 0,587 Not rejected
were "Competitors are not using those assets" with an H2b4 0,136 Not rejected
average of 2.30 and "These means are not important
H2b5 0,050 Rejected
within the industry in which the organization operates"
with 2.53.
Tests for hypothesis H3 are summarized in Table X. This
To test the hypotheses of the conceptual model, we used hypothesis H3: The barriers that prevent the use of social
the nonparametric Kruskal-Wallis method. As seen in Table VI media is not influenced by demographic characteristics of
for H1, only benefits influence the degree of involvement, so organizations, however, influences the type of customer
the decision process regarding the involvement is not barriers.

117 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 5, No. 12, 2014

TABLE IX. RESULTS OF KRUSKAL-WALLIS TEST FOR H2C With regard to the involvement, it was found that
organizational demographic characteristics did not influence
Hypotheses P Conclusion
the degree of involvement, which does not corroborate with
H2c1 0,025 Rejected [5]. Policies and strategies influence the degree of
H2c2 0,000 Rejected involvement, which is in agreement with the results presented
by this author.
H2c3 0,000 Rejected
H2c4 0,001 Rejected
VI. LIMITATIONS AND FUTURE RESEARCH
H2c5 0,645 Not rejected
We consider this study a step forward in analysis of the
current status of social media in organizations and in
H2c6 0,000 Rejected confirming some of the findings in the literature review.
H2c7 0,000 Rejected As observed in other countries and reported by several
H2c8 0,000 Rejected authors, there is a sizable group of companies that have already
adopted social media [23], [24], [30], [31]. Most Azorean
TABLE X. RESULTS OF KRUSKAL-WALLIS TEST FOR H3 organizations analyzed use these resources, essentially
preferring social networks like Facebook.
Hypotheses P Conclusion
However, the results indicate that organizations still are at
H3a 0,442 Not rejected an early stage of involvement, allocating few human and
H3b 0,070 Not rejected financial resources. Firms using social media go directly to the
implementation, neglecting strategy definition and monitoring.
H3c 0,429 Not rejected
Because of the vicissitudes deriving from a research project
H3d 0,045 Rejected
limited in time and in a limited geographic area, some
H3e 0,765 Not rejected limitations should be corrected in future investigations. The
sample size and the kind of sampling do not guarantee that the
V. DISCUSSION AND CONCLUSIONS findings are generally representative. Future work should
Tourism is the sector of activity with greater adoption of consider larger samples.
social media, which is justified because it is an industry where As a continuation of this research, the inclusion of new
the assets have a significant impact on consumer behavior, independent variables would verify their influence on the
such as the choice of destination and planning the entire trip dependent variables. It is important, for example, to include
[26]. The sectors of agribusiness, construction, fuel distribution variables such as: the existence of an internal policy on the use
and transport were those with lower adherence. of social media; top management support of employee
The reasons that organizations cite for the use of social participation in social media and analysis of productivity
media are mainly related with reaching a wider audience, during working hours. It is also important to know how the use
creating awareness, communicating with customers and of social media in organizations is influenced by the number of
increasing sales, which is consistent with the existing literature hours per week spent using the media, how employees are
and other studies [3], [20], [27], [28]. A less prevalent reason is affected by these resources, are these employees the people
recruiting. who manage online communities, and if this is done by people
/ organizations external to the firm.
Organizations still perceive a significant number of
benefits, particularly in terms of marketing and commercial. Future studies should identify and analyze the reliability of:
With regard to perceived difficulties, the most highlighted (1) the most commonly used metrics; (2) the tools used; (3) the
relate to the lack of resources and measure of the business frequency of use; (4) the difficulties perceived when using
impact (ROI), which is in agreement with some authors [29]. social media; and (4) if the results obtained are helping to
However, the absence of strategy and losing control of the achieve the goals set by organizations.
message, which is not in agreement with existing literature, REFERENCES
suggest that Azorean organizations are not obstacles to the [1] comScore, ―It’s a Social World: Top 10 Need-to-Knows About Social
success of their initiatives. The lack of support from top Networking and Where It’s Headed -,‖ 2011. [Online]. Available:
management and the reluctance of managers to share http://www.comscore.com/por/Insights/Presentations_and_Whitepapers/2
011/it_is_a_social_world_top_10_need-to-
information is identified as the least impactual. knows_about_social_networking. [Accessed: 09-May-2014].
Through hypothesis testing, it was found that involvement [2] P. Kotler, H. Kartajaya, and I. Setiawan, Marketing 3.0: from products to
is influenced only by the perceived benefits, which are in some customers to the human spirit. John Wiley & Sons, 2010.
way related to the above study by [5], which argues that [3] N. Michaelidou, N. T. Siamagka, and G. Christodoulides, ―Usage,
barriers and measurement of social media marketing: An exploratory
engagement depends on the goals. In terms of usage, it was investigation of small and medium B2B brands,‖ Ind. Mark. Manag., vol.
found that the variables of size, sector and type of client 40, no. 7, pp. 1153–1159, Oct. 2011.
activity influence the utilization of social media, pointing in the [4] K. Peters, Y. Chen, A. M. Kaplan, B. Ognibeni, and K. Pauwels, ―Social
same direction as the study [3]. Media Metrics — A Framework and Guidelines for Managing Social
Media,‖ J. Interact. Mark., vol. 27, no. 4, pp. 281–298, Nov. 2013.

118 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 5, No. 12, 2014

[5] E. Constantinides and S. J. Fountain, ―Web 2.0: Conceptual foundations investigation of small and medium B2B brands,‖ Ind. Mark. Manag., vol.
and marketing issues,‖ J. Direct, Data Digit. Mark. Pract., vol. 9, no. 3, 40, no. 7, pp. 1153–1159, Oct. 2011.
pp. 231–244, Mar. 2008. [20] J. M. Veríssimo and G. Coimbra, ―Marketing Digital em Portugal:
[6] T. O’Reilly, ―What Is Web 2.0,‖ 2005. [Online]. Available: Situação Actual e Tendências,‖ 2011.
http://oreilly.com/web2/archive/what-is-web-20.html. [Accessed: 17- [21] D. E. R. Service, ―Enterprise 2.0 - Harnessing social media,‖ Deloitte,
May-2014]. 2011. [Online]. Available: https://www.deloitte.com/assets/Dcom-
[7] P. R. Berthon, L. F. Pitt, K. Plangger, and D. Shapiro, ―Marketing meets Canada/Local
Web 2.0, social media, and creative consumers: Implications for Assets/Documents/ERS/ca_en_ers_enterprise_20_050611.pdf.
international marketing strategy,‖ Bus. Horiz., vol. 55, no. 3, pp. 261– [Accessed: 27-Jun-2014].
271, May 2012. [22] H. B. R. A. Services, ―the New Conversation: taking Social Media from
[8] Tiago, M.T., Veríssimo, J. ―Digital Marketing and Social Media: Why talk to action,‖ 2010.
bother?, Bus. Horiz., vol. 57, pp. 703—708 [23] M. A. Stelzner, ―2011 Social Media Marketing Report - How Marketers
[9] J. Surowiecki, ―The Wisdom of Crowds,‖ Am. J. Phys., vol. 75, p. 336, Are Using Social Media to Grow Their Businesses,‖ Social Media
2005. Examiner, 2011. [Online]. Available:
[10] L. Grossman, ―You -- Yes, You -- Are TIME’s Person of the Year - http://www.socialmediaexaminer.com/SocialMediaMarketingReport2011
TIME,‖ Time, 2006. [Online]. Available: .pdf. [Accessed: 27-Jun-2014].
http://content.time.com/time/magazine/article/0,9171,1570810,00.html. [24] M. A. Stelzner, ―2012 Social Media Marketing Industry Report - How
[Accessed: 26-Jun-2014]. Marketers Are Using Social Media to Grow Their Businesses,‖ Social
[11] Wikipedia, ―You (Time Person of the Year),‖ 2007. [Online]. Available: Media Examiner, 2012. [Online]. Available:
http://en.wikipedia.org/wiki/You_(Time_Person_of_the_Year). http://www.socialmediaexaminer.com/SocialMediaMarketingIndustryRe
port2012.pdf. [Accessed: 27-Jun-2014].
[12] J. Bernoff and C. Li, ―The POST Method: A systematic approach to
social strategy,‖ 2007. [Online]. Available: [25] INE, ―Inquérito à Utilização de Tecnologias da Informação e da
http://forrester.typepad.com/groundswell/2007/12/the-post-method.html. Comunicação nas Empresas 2012,‖ 2012.
[Accessed: 26-Jun-2014]. [26] J. Miguéns, R. Baggio, and C. Costa, ―Social media and Tourism
[13] J. Bernoff, ―Social Technographics: Conversationalists get onto the Destinations : TripAdvisor Case Study,‖ in Advances in Tourism
ladder,‖ 2010. [Online]. Available: Research, 2008, vol. 2008, pp. 1–6.
http://forrester.typepad.com/groundswell/2010/01/conversationalists-get- [27] F. Cipriani, ―Mídias sociais nas empresas - O relacionamento online com
onto-the-ladder.html. [Accessed: 26-Jun-2014]. o mercado,‖ Deloitte, 2010. [Online]. Available:
[14] D. Evans, Social media marketing: An hour a day. 2008. http://www.deloitte.com/assets/dcom-brazil/local
assets/documents/estudos e pesquisas/apresentacao_midiassociais.pdf.
[15] T. Weinberg, The new community rules: Marketing on the social web. ― [Accessed: 30-Jun-2014].
O’Reilly Media, Inc.,‖ 2009.
[28] Webmarketing123, ―2012 State of Digital Marketing Report,‖ 2012.
[16] L. S. L. Lai and E. Turban, ―Groups Formation and Operations in the
Web 2.0 Environment and Social Networks,‖ Gr. Decis. Negot., vol. 17, [29] J. Evans, Dave; Mckee, Social media marketing: the next generation of
no. 5, pp. 387–402, Jun. 2008. business engagement. John Wiley & Sons, 2010.
[17] A. Richter, A. Stocker, S. Müller, and G. Avram, ―Knowledge [30] M. A. Stelzner, ―2013 Social Media Marketing Industry Report - How
management goals revisited: A cross-sectional analysis of social software Marketers Are Using Social Media to Grow Their Businesses,‖ Social
adoption in corporate environments,‖ VINE, vol. 43, no. 2, pp. 132–148, Media Examiner, 2013. [Online]. Available:
2013. http://www.socialmediaexaminer.com/SocialMediaMarketingIndustryRe
port2013.pdf. [Accessed: 27-Jun-2014].
[18] G. Drury, ―Opinion piece: Social media: Should marketers engage and
how can it be done effectively?,‖ J. Direct, Data Digit. Mark. Pract., vol. [31] B. D. Weinberg and E. Pehlivan, ―Social spending: Managing the social
9, no. 3, pp. 274–277, Mar. 2008. media mix,‖ Bus. Horiz., vol. 54, no. 3, pp. 275–282, May 2011.
[19] N. Michaelidou, N. T. Siamagka, and G. Christodoulides, ―Usage, [32] R. A. Peterson, ―A meta-analysis of Cronbach's coefficient alpha.‖
barriers and measurement of social media marketing: An exploratory Journal of consumer research, pp. 381-391, 1994.

119 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 5, No. 12, 2014

Weighted Marking, Clique Structure and Node-


Weighted Centrality to Predict Distribution Centre’s
Location in a Supply Chain Management
Akanmu, Amidu A. G., Wang, Frank Z., Yamoah, Fred A.,
School of Computing, School of Computing, School of Business,
University of Kent, Medway, University of Kent, Canterbury, University of Kent, Medway,
ME4 4AG, Kent Kent ME4 4AG, Kent

Abstract—Despite the importance attached to the weights or The importance of location of distribution centre is echoed
strengths on the edges of a graph, a graph is only complete if it by [8], when they said “Moreover, the advantage of an optimal
has both the combinations of nodes and edges. As such, this location for distribution centre is not only to reduce
paper brings to bare the fact that the node-weight of a graph is transportation costs, but also to improve business performance,
also a critical factor to consider in any graph/network’s increase competitiveness and profitability”.
evaluation, rather than the link-weight alone as commonly
considered. In fact, the combination of the weights on both the Although, [9] indicated that “Rarely do members of a group
nodes and edges as well as the number of ties together contribute have direct ties with each and every member.”, the case studies
effectively to the measure of centrality for an entire graph or we focused on are the road links(edges) to shops coupled with
network, thereby clearly showing more information. Two the node weights(sales values of each shop) and this is such
methods which take into consideration both the link-weights and that every shop has a road link to another shop thereby forming
node-weights of graphs (the Weighted Marking method of a clique.
prediction of location and the Clique/Node-Weighted centrality
measures) are considered, and the result from the case studies Link-Weighted Centrality
shows that the clique/node-weighted centrality measures give an The equation (1) below represents the weighted degree
accuracy of 18% more than the weighted marking method, in the
centrality with respect to the edges or links.
prediction of Distribution Centre location of the Supply Chain
Management. N

Keywords—Centrality measures; Graph; Network; Clique


w pq

S p  C ( p) 
W q

n 1
D
I. INTRODUCTION (1)
The formal theory of social network analysis encompasses where p is the focal node ; q= adjacent node ; w= weight
centrality measures, and these are to be employed in this attached to the edge ; and n= total number of nodes in the
research that dwells on the mergers of weights (link-weights graph.
and node weights) to evaluate network topologies and make a
prediction. The strength attached to the nodes also called the The above argument is now extended to the weighted
node-weights represents a certain attribute of a particular node centrality of the four measures, i.e. Degree, Closeness,
(e.g. population of a city), and the same goes for the strength Betweenness and the Eigenvector. The degree centrality of any
attached to edges (e.g. distance between cities).[1] node S taking cognisance of the strength of the incident edges
is herein defined as the weighted degree centrality of node s
According to [2], in their study of weighted networks, they and is represented in normalised form as
carried out statistical analysis of complex networks whose
edges have assigned a given weight (the flow or the intensity), n
and such according to them can generally be described in terms
of weighted graphs and more so that a more complete view of
åw st

C (s) =
W
D
t
(2)
complex networks is provided by the study of the interactions
defining the links of those systems. Although, [3], [4], [2], [5]
n -1
have only emphasized on the attachment of weights to the where wst is the sum of the weights of edges connected to
edges and not to the nodes in their various studies, [6] and [7] the particular source node s and t represents a particular target
have considered both the weights on the edges and also the node. In the same vein, the weighted closeness centrality, ccw
number of edges attached to a particular node. This work
however concerns itself with both nodes and edges while (s) is also represented by
considering the degree, eigenvector, betweenness and
CCW (s) = n-1 (3)
closeness centralities. ådw (s,t )

120 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 5, No. 12, 2014

where dw(s,t) is the weight of geodesic paths between s and In view of this only the cases whereby α is less than 1 or
t, while the weighted betweenness centrality is greater than 1 shall be considered, specifically cases of α
=0.25; 0.5; 0.75; 1.25; 1.5 and 1.75
s wst (v) / s wst
CBW (v) = å (4) A tuning parameter  was introduced by [1] to take care of
s¹v¹tÎv (n -1)(n - 2) the weightedness on the nodes, although the tuning parameter α
s¹t was applied to the degree/strength of the edges. The newly
evolved equation by way of introduction of a tuning parameter
where s st is the number of the shortest geodesic paths  will now be the product of degree of a focal node, the
from s to t, s st (v) is the number of the shortest geodesic average weight to these nodes as adjusted by the newly
introduced tuning parameter  and the weight accorded to each
paths from s to t that pass through node v and w is the assigned node. So, for weighted degree centrality at α and  we shall
weights to the ties. Similarly, the weighted eigenvector now have
centrality could be seen as
x = Awx s
(5)
cdwab
wa
(i) = ki ´ ( i )a = ki(1-a ) ´ sia ´ zib (8)
w
where A is a square matrix of the weights on the edges of ki
A and x is an eigenvector of A.
where ki = degree of nodes
A tuning parameter α was introduced to determine the
relative importance of the number of ties compared to the si = c Dw (s) as defined
weights on the ties by [7]. Equation (6) below thereby
represents the product of degree of a focal node and the zi = weight of nodes, where α ≥ 0 ; {  : -1    1}
average weight to these nodes as adjusted by the introduced The choice of value of  depends on what effect the weight
tuning parameter. So, for weighted degree centrality at α we is having on the new centrality measure, if for instance the
have: weight is having a positive effect (e.g. profit) the positive value
sp of  is employed otherwise the negative value(e.g. loss) shall
cdw ( p)  k p  ( )  k (p1 )  sp be used in our calculation.
kp The next sections are organised as follow, section II
(6)
discusses the two methods Weighted Marking Method and
where kp = degree of nodes Clique Structure/Node-Weight Modulated Centrality Measure
as applicable to the supply chain management. Section III
Sp = c Dw (p) as defined in (1) above , and α is ≥ 0 explains the implementation of the exercise with such tools as
So for weighted closeness centrality at α we have UCINET, tnet and Excel while section IV shows the
predictions of the new distribution centres. Section V
s concludes with the discussion of the results.
ccwa (i) = ki ´ ( i )a = ki(1-a ) ´ sia (7)
ki II. WEIGHTED MARKING METHOD AND CLIQUE
STRUCTURE/NODE-WEIGHT MODULATED CENTRALITY
w
where ki =degree of nodes and si = cC (s) is as defined in MEASURES APPLIED TO SUPPLY CHAIN MANAGEMENT
(3) above, α is ≥ 0, and similarly for the degree centrality; A. Weighted Marking Method
betweenness centrality and eigenvector centrality.
Weights On Nodes
In the supply chain management (SCM), the node-weights Airport
could be any of the volume of sales, cost of storage or EDC
turnover at a depot/store, while the edges will be the distance
between each depot and a proposed distribution centre (DC).
TESCO shops of different counties are used as case studies
here. Fig. 1. Figure showing schematic diagram of Weighted Marking Method,
with cones as shops & EDC as existing DC
For the SCM, since the shops sampled are maximally
connected, the advantage of the clique structure was exploited Three main stages were proposed by [8] in choosing a
to map out different clique of shops and thereby making the location for DC using the Weighted Marking Method (WMM):
most central node of the chosen clique to be representative of
that clique for the purpose of prediction of a proposed DC. Stage1 – Identification of a general geographical area for
DC based on the principle of centre of gravity while
Node-Weight Modified Centrality Measure considering socio-economic factors. For the Scotland region in
our case study, Glasgow and Edinburgh are considered as
From (6), when α=0 only the degree of nodes will be
being the most populated and with tendencies for more
measured and if α=1 only the weights on the ties are measured.
economic activities.

121 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 5, No. 12, 2014

Stage2 – Identification of alternative locations of DC, these fig.2. below, the clique of a graph is considered from among
are the shops (cones) as in fig.1 whereby EDC is the existing which the most central of the nodes is taken to be
DC. The considered criteria for the cities in stage1 are: representative of that clique, which in turn is considered for the
Criteria1 - C1 (proximity to customer bases); C2(Expansion prediction test along with the other cliques. From the Node-
capability); C3(percentage of unemployment [to measure Weighted Centrality Measure, the two nodes 5 and 22 are the
availability of labour force]) and C4(Average Income of most central in terms of the node-weightedness, thereby
residents[to measure standard of living]). representing the cliques of Glasgow and Edinburgh
respectively.
Stage3 – Selection of specific sites among the alternative
locations in Stage2 using quantitative approach after having set
a certain treshold (e.g. Composite functions greater than or
equal to 5), i.e. the composite point for each node is calculated
using the formula below:
Composite Point= ∑14(Point related to each criteria * weighting
factor of that criteria) (9)
Thereafter the minimum from the products of Sales
Volume and Distance is chosen as in (10) below
Min{VD}=min{Volume of Sales*Distance} (10)
Applying the technique of [8], the result of Table1 was
obtained:
From above, node 79 is the winner and its distance from the
existing DC is 14.1 units, therefore the error of prediction is
14.1/60 * 100 = 23.5%, which gives an accuracy of 76.5%.
The network coverage of an existing distribution centre
(DC) located at Scotland was investigated and the retail outlets
or shops are considered as nodes with the value of sales taken
to be the weights on the nodes while distances between nodes
are regarded as the weights on the edges. For our sample a
30miles radius coverage of shops from the existing DC was
taken and this makes 63nodes all connected by distances (see
fig.2 below). The nearest DC to this existing one is some
171miles away, so our coverage for this purpose is of 60miles
diameter, although this could be extended in future. Out of the
community of 63 shops, the Central and Lothian Counties
accommodated 43 of these shops while Glasgow city and
Edinburgh have 30 of these. The existing DC at Livingston is
actually situated in-between these two cities. The clique of
shops within Glasgow and Edinburgh were examined and the
most central from the two cliques were considered for the
prediction of the new DC.
B. Clique Structure/Node-Weight Modulated Centrality
Measures
The first case study was the region of Scotland and the Fig. 2. Figure showing the coverage of the 30miles radius of Scotland,
second was for the region of Northern Ireland. As depicted in cliques at Glasgow & Edinburgh ( Source : www.rightmove.co.uk )

122 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 5, No. 12, 2014

different values of alpha (in terms of degree, closeness,


eigenvector and betweenness).

Fig. 4. Figure showing the implementation of node-weighted centrality


measure to the cliques of SCM

The radius of coverage according to fig.1 is 30miles,


accordingly the farthest possible distance apart of any two
nodes will be the diameter of such circle which is 60miles.
This was used for the calculation of ratio of distances from any
node to that of farthest distance apart. The percentage error of
Fig. 3. Figure showing the cliques at Northern Ireland cities (Londonderry, prediction is therefore calculated by multiplying this ratio by
NewtonAbbey & Belfast)
100 and from this emerges the percentage accuracy.
Tables II & III below show the respective results that were IV. CLIQUE/NODE-WEIGHTED MEASURE APPLIED TO THE
obtained when the Node-Weight Modulated Centrality SUPPLY CHAIN MANAGEMENT (SCM)
Measures are applied to the 7 nodes of Glasgow and 23 nodes
of Edinburgh from the supply chain management dataset. x is the proportional distance to the proposed Distribution
Centre. TSV = Total Sales Values
III. IMPLEMENTATION
The driving distance between node 5 (representing
The initial dataset of the distances between the 30 sales Glasgow clique) and node 22 (representing Edinburgh clique)
outlets of Glasgow and Edinburgh’s cliques were presented as is 42.8miles.
a 7x7 square matrix and 23 x 23 square matrix respectively,
these were obtained from the UCINET and tnet software, saved 1-x/ x = 41270/72743
in Excel format and later imported into UCINET for the x = 0.36 (i.e. 36% of 42.8) which is 13.1miles
purpose of centralities calculations, see fig. 4 below.
If x is some 13.1miles away from the highest sales valued
The results came out as text files listing the different node 22 (Edinburgh) , and the existing DC is 15.4 miles away
columns for each centrality measure, and for the purpose of from the same node 22, the difference of the predicted DC will
calculations of the node-weight modulated centrality, the be 2.3miles away from the existing DC, hence
values from the text files were exported into Excel where a
column was created for the weights on the nodes. Tables II & The error rate of the predicted DC = (2.3/42.80) x 100 =
III below depict the node-weighted centrality measures at 5.37% i.e. the percentage accuracy of the prediction = 94.63%

123 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 5, No. 12, 2014

With all the results above one is now in a position to


predict the most probable (regions with respect to the nodes)
that could serve as a distribution center for all other outlets
considering their node-weighted centrality and clique
structures going by the percentage accuracy of the prediction.
Similar argument is also extended to some 51 shops at the
Northern Ireland whereby three cliques are considered, that is
the cliques at Belfast (14 shops); Londonderry (three shops)
and NewtonAbbey (four shops). See fig.3 above, whereby the
centre of mass of the triangle was considered to be the
predicted Distribution Centre, while the angles of the nodes are
calculated using the cosine rule, see fig.5 below:

Fig. 6. Figure showing the co-ordinates of the triangle

Hence, for the predicted DC (the centre of mass), the co-


Fig. 5. Figure showing the angles of triangle BNL, distances apart of the
nodes and sales values of each clique ordinates are
n

1
Since only the distance between the shops are available, mx i i
from the calculated angles the co-ordinates were arrived at. See
fig.6. xcm  i 1

n M (11)
From the figure above, B(0,0) indicates origin whereby x1=
0 and y1 = 0 n

1
my i i
a1 = 74.2 Cos46.4 = 51.17
h = 74.2 Sin46.4 = 53.73 ycm  i 1

n M (12)
The co-ordinate of the centre of mass for the triangle BNL
is calculated using facts from fig.6. above, whereby x1= 0; x2= n
7.4 ; x3=51.17 ; y1 = 0; y2 = 0; and y3 = 53.73 where M   mi (the total weights on the nodes) and
i 1
Total sales value at clique B (w1 = 88732) ; at clique N (w2 n = number of nodes/vertices.
=18929) and at clique L (w3 = 16279) .

124 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 5, No. 12, 2014

Substituting in the values from fig.6 above, equations (ii)


and (iii) become respectively,
xcm = {[0 x 88732]/[88732+18929+16279] + [7.4 x
18929]/[88732+18929+16279] + [51.17 x
16279]/[88732+18929+16279]}/3
therefore, xcm =2.62 . Similarly, ycm = 2.35
So, the predicted DC has co-ordinates (2.62, 2.35), hence
the distance from clique B as shown in fig.7.
BP = {2.352 + 2.622)½ = 3.52
From fig.3 above, the existing DC is 6.3 miles to the clique
at Belfast, so the error is 6.3 – 3.52 = 2.78 , that is, the
percentage error is (2.78/73.6 x 100) , the farthest distance
apart from the existing DC being 73.6, therefore the percentage
accuracy for this prediction is 96.2%. Fig. 7. figure showing the carved out portion of triangle BPH from triangle
BNL of fig.6 above. P is the predicted DC

TABLE I. RESULTS OF WEIGHTED MARKING METHOD APPLIED TO SCOTLAND DATASETS


%AGE %AGE %AGE %AGE
C1 of C1 of C2 C3 of C3 C4 of C4
PRODU
CT OF
No Distan PT X PT X PT X PT X
Unit Town Point of Expansi Availabil Point of Standa Point of DIST.
de Store ce to WT WT WT WT Compos
s /City Evaluati on ity of Evaluati rd of Evaluati TO DC
No Format Existi FACT FACT FACT FACT ite Point
Sold on capacity W/Force on Living on &UNITS
ng DC OR OR OR OR
SOLD

439 GLASGO
1 0 Extra W 29.5 1 0.25 8 2 VH 9 2.25 852 7 1.75 6.25 129505
930 GLASGO
6 0 Extra W 29 2 0.5 8 2 VH 9 2.25 852 7 1.75 6.5 269700
107 GLASGO
8 90 Extra W 30 1 0.25 8 2 FH 7 1.75 860 7 1.75 5.75 323700
147 EDINBUR
55 80 Extra GH 15 9 2.25 8 2 - 0 0 939 9 2.25 6.5 221700
111 EDINBUR
56 0 Metro GH 20.3 6 1.5 4 1 VL 3 0.75 867 7 1.75 5 22533
185 EDINBUR
57 0 Metro GH 17.1 7 1.75 4 1 LST 1 0.25 939 9 2.25 5.25 31635
532 Superst EDINBUR
64 0 ore GH 20.3 6 1.5 6 1.5 FH 7 1.75 860 7 1.75 6.5 107996
104 Superst EDINBUR
65 80 ore GH 17.9 7 1.75 6 1.5 FH 7 1.75 860 7 1.75 6.75 187592
940 Superst EDINBUR
66 0 ore GH 13.1 9 2.25 6 1.5 VH 9 2.25 852 7 1.75 7.75 123140
615 Superst EDINBUR
67 0 ore GH 21.2 5 1.25 6 1.5 FH 7 1.75 860 7 1.75 6.25 130380
EDINBUR
71 130 Express GH 19 7 1.75 2 0.5 FH 7 1.75 810 5 1.25 5.25 2470
EDINBUR
72 690 Express GH 19.4 6 1.5 2 0.5 FH 7 1.75 860 7 1.75 5.5 13386
EDINBUR
73 200 Express GH 25.8 3 0.75 2 0.5 VH 9 2.25 852 7 1.75 5.25 5160
EDINBUR
74 230 Express GH 15.8 8 2 2 0.5 VL 3 0.75 899 7 1.75 5 3634
EDINBUR
75 260 Express GH 17.1 7 1.75 2 0.5 FH 7 1.75 860 7 1.75 5.75 4446
EDINBUR
77 890 Express GH 22.7 5 1.25 2 0.5 FH 7 1.75 860 7 1.75 5.25 20203
EDINBUR
78 390 Express GH 17.7 7 1.75 2 0.5 FH 7 1.75 860 7 1.75 5.75 6903
EDINBUR
79 70 Express GH 14.1 9 2.25 2 0.5 VL 3 0.75 860 7 1.75 5.25 987
EDINBUR
80 320 Express GH 20.8 6 1.5 2 0.5 FH 7 1.75 860 7 1.75 5.5 6656
EDINBUR
82 520 Express GH 16.5 8 2 2 0.5 LO 5 1.25 860 7 1.75 5.5 8580
EDINBUR
84 700 Express GH 21.3 5 1.25 2 0.5 FH 7 1.75 860 7 1.75 5.25 14910

125 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 5, No. 12, 2014

TABLE II. TABLE SHOWING THE NODE-WEIGHTED CENTRALITIES AT DIFFERENT ALPHA AND AT BETA=1 FOR GLASGOW (CLIQUE1)
NODE-WEIGHTED DEGREE CENTRALITY NODE-WEIGHTED EIGEN-VECTOR CENTRALITY NODE-WEIGHTED BETWEENNESS CENTRALITY NODE-WEIGHTED CLOSENESS CENTRALITY
NODE

α=1½

α=1½

α=1½

α=1½
α=1¾

α=1¾

α=1¾

α=1¾
α=1¼

α=1¼

α=1¼

α=1¼
α=¾

α=¾

α=¾

α=¾
α=½

α=½

α=½

α=½
α=¼

α=¼

α=¼

α=¼
636 181 514 837 136 586 954 392 487
4487 09. 9017 213. 2568 3641 95. 76. 294 3607 870 766 65. 08. 6042 929 11534 14308 194 119 737 171 105
1 1.4 9 3.7 3 88.7 66.5 5 9 .6 34.7 .9 .6 4 6 2.8 80.5 2.0 1.3 79.8 88.2 7.7 2794.2 9.6 8.3

141 435 111 183 304 137 227 867 111 240
9768 956 2063 704. 6331 9202 193 949 310 8328 775 923 12. 867 1443 193. 30987 39976 424 268 169 427 270
3 0.6 .1 00.4 2 95.2 02.8 .7 .4 .2 22.6 0.4 2.3 6 .1 18.5 9 1.7 2.4 68.9 33.7 54.7 6768.7 6.8 2.2

103 405 736 130 232 130 232 593 850 250
6535 170 1628 785. 6405 1011 21. 901 747 7358 829 618 36. 32. 1218 242. 35860 51390 277 185 124 372 249
4 9.7 .8 55.8 6 35.6 090.4 3 .3 .0 08.2 2.6 9.8 6 0 54.5 3 8.2 1.2 18.2 55.2 21.3 5566.4 6.3 4.4

201 813 1295 145 264 481 291 530 103 135 299
1267 788 3211 810. 383. 2061 086 307 493 1597 094 293 780 232 1762 215. 38989 50806 550 380 262 12551. 867 599
5 71.6 .8 97.7 0 4 928.6 .8 .3 .6 913.3 8.7 0.0 .1 .6 17.4 2 8.1 4.2 37.5 33.9 83.4 8 4.0 4.2

587 243 415 758 138 843 154 317 442 119
1 3654 14. 9432 434. 3910 6282 43. 60. 525 4619 478 024 20. 27. 6166 882. 16715 23305 156 108 746 244 168
1 8.1 3 4.3 6 76.5 62.6 2 3 .5 12.0 .6 1.6 4 5 6.1 1 0.6 6.8 91.6 23.0 5.0 3551.3 9.5 9.5

118 514 839


1 3896. 554 7888. 159 2272 3233 446 728 71. 3156 65. 18. 329 395 4750. 685 169 104 644. 151.
3 4 4.0 5 70.6 4.1 3.5 5.1 0.7 6 3.2 8 1 0.5 3.8 9 9.5 8242.4 9904.0 0.3 3.4 1 245.4 5 93.5

100 451 745


1 3199. 461 6664. 138 2003 2892 366 605 01. 2730 28. 73. 274 339 4196. 642 142 911. 584. 153.
5 0 7.4 5 83.9 9.3 3.8 2.5 2.2 1 9.7 5 5 2.1 2.4 9 3.7 7947.2 9832.0 1.0 0 1 240.1 9 98.7

TABLE III. TABLE SHOWING THE NODE-WEIGHTED CENTRALITIES AT DIFFERENT ALPHA AND AT BETA=1 FOR EDINBURGH (CLIQUE2)
NODE-WEIGHTED DEGREE NODE-WEIGHTED EIGEN-VECTOR NODE-WEIGHTED BETWEENNESS NODE-WEIGHTED
CENTRALITY CENTRALITY CENTRALITY CLOSENESS CENTRALITY
NODE

α=1¾

α=1¾

α=1¾

α=1¾
α=1½

α=1½

α=1½

α=1½
α=1¼

α=1¼

α=1¼

α=1¼
α=¾

α=¾

α=¾

α=¾
α=½

α=½

α=½

α=½
α=¼

α=¼

α=¼

α=¼
10 52 13
441 533 644 453 510 576 826 931 303 228 172 203 31 25 41 67 34
646. 48542 551. 582. 70848 77871 097. 928. 140. 73259 101. 540. 108. 651. 484. 98152. 74041 55853. 579. 43 8. 4. 96 43
22 1 8.6 6 4 3.1 8.5 5 4 5 6.9 5 6 9 4 2 2 .5 5 3 .9 1 5 .5 .5
68 30 60 27 12
322 30416. 287 255 24118. 22759. 338 335 333 32746. 324 321 206 124 753 1660. 152 19 46 8. 1. 1.
23 32.5 3 02.4 58.9 7 7 70.3 85.7 03.6 4 71.3 98.5 34.5 65.4 0.4 2748.1 2 1002.9 61.9 .2 .9 3 8 4
13
105 117 76 67 16 80 39
612 65972. 710 822 88514. 95263. 631 700 776 95555. 973. 528. 420 310 229 12494. 9225. 280 6. 68 35 4. 5.
24 98.6 5 02.7 43.2 1 1 65.7 52.7 90.6 4 8 2 51.1 46.7 22.1 9 1 6811.0 01.3 3 .0 .8 2 4
16
72 76 16 74 34
742 69365. 648 566 52960. 49505. 785 777 770 75475. 747 739 468 276 163 3346. 364 2. 75 16 2. 0.
25 06.1 2 40.2 56.4 4 5 87.1 97.5 15.8 8 17.4 66.6 33.4 29.6 00.2 5673.2 9 1974.5 35.0 4 .0 .8 0 6
32 14
144 133 123 152 154 157 163 166 02 76 31 14 66
722. 13898 481. 112. 11823 11354 807. 952. 126. 16156 833. 132. 921 563 344 12873. 7870. 694 8. 5. 38 46 7.
30 5 8.2 0 8 4.7 9.9 8 0 2 6.5 5 3 32.9 29.4 39.4 5 7 4812.1 73.0 6 9 .4 .9 0
68 32
318 359 406 328 360 395 520 571 215 154 110 143 49 75 74 35 17
409. 33837 586. 087. 43154 45860 540. 246. 013. 47493 769. 027. 012. 294. 722. 57017. 40916 29361. 261. 8. 1. 87 80 11
31 5 2.0 2 9 7.4 3.1 4 8 1 5.1 5 4 3 1 3 4 .0 6 1 2 4 .4 .0 .7
68 35
290 376 486 297 346 403 638 743 198 154 120 132 33 36 94 49 25
386. 33044 022. 912. 55407 63050 308. 383. 557. 54777 195. 537. 860. 966. 761. 73334. 57147 44533. 057. 9. 5. 70 01 36
32 7 2.0 5 5 6.2 4.3 6 0 8 7.5 0 0 2 4 2 2 .4 4 8 2 2 .8 .1 .3
35 16
157 157 157 165 173 182 211 222 102 37 76 37 17 84
564. 15762 693. 822. 15788 15795 476. 857. 662. 20163 845. 574. 367. 665 432 18268. 11873 746 9. 8. 66 85 6.
33 6 9.0 5 4 6.9 1.5 9 7 9 3.7 6 7 7 34.2 44.1 0 .3 7717.1 47.8 6 3 .7 .2 1
72 33
322 287 256 341 340 339 337 336 200 117 690. 157 5. 4. 70 32 15
35 9.5 3048.7 8.0 4.8 2421.2 2285.7 2.6 4.2 5.9 3379.3 1.0 2.7 6.6 7.0 3 237.5 139.3 81.7 5.7 8 3 .9 .7 .0
44 20 43 19
199 18730. 175 155 14576. 13690. 211 210 209 20719. 206 205 124 728 426 973 63 46 0. 7. 90
36 42.1 5 92.4 19.5 5 9 28.7 25.6 23.1 5 18.5 18.0 34.6 2.3 4.8 1462.8 856.7 501.7 4.5 .0 .2 1 2 .4
12 63 16
503 612 745 528 611 707 109 126 324 230 164 236 21 1. 9. 87 45
37 6.8 5554.6 5.7 0.1 8216.0 9060.8 5.0 5.7 6.9 9476.3 65.7 89.2 7.1 8.6 1.3 829.6 589.8 419.4 2.2 .8 9 1 .4 .2

126 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 5, No. 12, 2014

20 10 25 12
877 103 122 13378. 14557. 897 998 111 13764. 153 170 608 458 346 1485. 402 11 04 0. 5. 62
38 1.0 9543.9 84.8 95.4 8 6 1.3 4.7 12.6 9 19.7 50.2 0.9 7.3 0.6 1969.4 7 1120.8 6.5 .3 .7 7 2 .6
15 67 13
761 695 635 794 791 789 782 779 501 315 198 350 41 7. 1. 57 25
39 6.4 7280.0 8.4 7.3 6076.5 5808.1 3.9 9.5 5.2 7846.8 2.7 8.7 5.5 6.9 7.0 787.2 495.5 311.9 4.2 .1 7 1 .6 .3
10 51 13
438 555 704 453 527 613 964 112 291 217 162 198 10 4. 3. 67 34
40 5.8 4936.9 7.3 1.8 7926.7 8922.8 1.7 0.7 0.4 8293.1 5.7 18.9 1.0 4.9 4.9 907.0 677.7 506.3 3.8 .1 3 3 .9 .6
60 27 58 26 12
268 25396. 240 214 20294. 19187. 283 282 281 28033. 279 278 165 966 563 1117. 130 17 69 6. 9. 4.
41 61.6 9 12.1 64.9 5 9 34.9 59.3 83.9 6 58.7 84.1 69.1 3.1 5.5 1916.7 8 651.9 75.1 .3 .3 5 9 2
22 10 20
107 10145. 959 859 113 112 112 11182. 111 111 666 392 230 509 92 30 8. 93 42
42 22.5 6 9.8 4.6 8132.2 7694.6 02.1 72.0 42.1 4 52.7 23.0 9.1 4.8 9.8 800.0 470.8 277.1 6.7 .2 .9 5 .8 .2
52 28
215 295 405 220 263 314 537 642 149 121 987. 985. 7. 2. 80 43 23
43 6.3 2525.2 7.1 5.1 4748.7 5560.9 1.3 1.5 5.8 4495.6 4.3 4.6 6.0 5.4 5 651.8 529.5 430.2 2 1 0 .7 .2 .1
20 96 21 10
895 910 926 940 996 105 11847. 125 132 579 378 247 423 17 1. 8. 4. 49
44 5.4 9032.3 9.8 6.8 9346.3 9426.5 6.4 4.9 56.6 4 50.8 96.0 9.9 8.5 4.6 1055.8 689.7 450.5 2.4 .5 7 5 2 .6
10 55 14
445 510 583 469 529 596 853 962 285 195 133 212 81 0. 2. 72 37
45 8.9 4768.9 0.5 4.4 6240.1 6674.0 8.1 4.3 6.3 7576.8 8.3 2.0 3.0 2.4 6.1 625.7 428.2 293.0 2.9 .0 5 7 .7 .0
35 16 33 15
168 16593. 163 159 15683. 15464. 175 179 184 19366. 198 203 112 740 487 1391. 777 41 13 4. 2. 69
46 28.7 2 61.0 06.2 6 1 04.4 52.4 11.8 2 61.8 70.1 39.3 1.2 3.8 2113.5 8 916.5 4.9 .8 .4 8 5 .5
85 37
400 342 292 422 413 403 376 367 250 145 842. 192 4. 9. 75 33 14
47 2.5 3700.1 0.6 3.3 2702.4 2498.3 9.0 0.8 4.8 3849.6 0.1 2.8 9.0 4.0 6 282.9 164.0 95.0 3.8 8 8 .0 .3 .8
49 23 54 25 12
216 21380. 211 206 20391. 20152. 228 237 248 26959. 281 293 136 856 535 1309. 104 91 83 3. 9. 3.
48 34.7 2 28.7 34.6 9 0 23.0 93.4 05.0 2 05.4 00.4 90.8 1.9 4.4 2094.1 6 819.0 53.0 .1 .1 3 4 9
37 17 35 16
166 15399. 142 122 11322. 10484. 176 173 170 16530. 162 159 101 576 326 820 52 15 8. 3. 74
49 29.7 1 59.6 27.4 6 7 63.5 73.2 87.6 6 58.9 91.7 72.6 2.3 4.0 1047.3 593.3 336.0 9.5 .8 .5 5 9 .9

V. CONCLUSION
Two case studies were considered in the supply chain bandwidth within two nodes and physical distances between
management, one considered two cliques with horizontal them could be combined in future.
distance apart (case study of Glasgow clique and Edinburgh
cliques in Scotland), while the second case study considered This model could be further extended to other datasets such
the triangular shaped cliques of (Londonderry, NewtonAbbey as in the area of disease control, whereby the model can be
and Belfast , in the Northern Ireland). used to detect the most central region where epidemic diseases
are proned to spread easily or to find the most vulnerable group
The results obtained show that the combined weights have in the society to an epidemic disease. Here the node weight
an obvious effect on the centralities of the nodes considered as could be the preponderance of an infectious disease in a
evidenced in the case studies of the Supply Chain particular node and the edge weight will be the distance apart
Management(SCM). The tuning parameters alpha (whose from of highly infected nodes to other nodes in such a graph.
values range between 0.25 and 1.75) acts as the bounds for the
relative importance of number of ties/weight of ties and the ACKNOWLEDGEMENT
tuning parameter beta (whose values are -1 and +1) serves as We wish to acknowledge Dunnhumby, United Kingdom
multiplicative/dividing factors for weights of nodes. (www.dunnhumby.com) for providing the TESCO datasets that
Graphs in the SCM were considered and effects of the were used for the purpose of this research.
combined weights on edges (distance between shops) and REFERENCES
weights on nodes (sales value for SCM) were evaluated taking [1] A.A.G. Akanmu, F.Z. Wang & A.F. Yamoah(2014). Clique Structure
the betweenness, closeness, eigenvector and degree centrality and Node-Weighted Centrality Measures for Predicting Distribution
into cognisance. The resulting node-weight modulated Centre Location in the Supply Chain Management", IEEE Technically
Co-Sponsored Science and Information Conference Aug.27-29, 2014,
centrality was then applied to the sales dataset while London UK.
introducing an additional tuning parameter  thereby making [2] A. Barrat, M. Barthelemy, R. Pastor-Satorras, & A. Vespignani (2004).
use of two parameters α and . The Architecture of Complex Weighted Networks. Proceedings of the
National Academy of Sciences 101(11), 3747-3752. arXiv:cond-
The resulting predictions in both cases were 94.6% accurate mat/0311416.
for the Scotland cliques compared with the accuracy of 76.5% [3] M.S. Granovetter. (1973). The Strength of Weak Ties. American Journal
obtained with the Weighted Marking Method while 96.2% of of Sociology, University of Chicago Press. pp. 1360-1380.
accuracy was obtained in the case study involving the Northern [4] U. Brandes (2001). A Faster Algorithm for Betweenness Centrality.
Ireland with the clique/node-weighted centrality measure. Journal of Mathematical Sociology 25, 163-177.
[5] M.E.J. Newman (2001). Scientific Collaboration networks. II. Shortest
VI. FUTURE STUDIES paths, weighted networks, and centrality. The American Physical
Society. Physical review E, Volume 64, 016132
The links/edges between nodes might not just be road
[6] T. Opsahl, F. Agneessens & J. Skvoretz (2010). Node Centrality in
linkage, two attributes might be considered, e.g communication Weighted Networks: Generalizing degree and shortest Paths. Social
Networks 32(2010) 245-251. Elsevier B.V.

127 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 5, No. 12, 2014

[7] H. Zhuge & J. Zhang(2010). Topological Centrality and It’s e-Science [9] B.V. Carolan(2014). Social Network Analysis and Education: Theory,
Applications. Wiley Interscience. Methods & Applications.SAGE Publications, USA. pp120
[8] V.V. Thai & D. Grewal(2005).Selecting the location of Distribution [10] Mapdata(2014). Properties on Map. Accessed 26 December, 2014 from
Centre in Logistics Operations: A Conceptual Framework and Case http://www.rightmove.co.uk/commercial-property-to-
Study. Asia Pacific Journal of Marketing and Logistics, 17(3), 3-24. let/map.html?locationIdentifier=REGION%5E815&insId=1&radius=40.
0

128 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 5, No. 12, 2014

Social Networks‟ Benefits, Privacy, and Identity


Theft: KSA Case Study
Ahmad A. Al-Daraiseh Afnan S. Al-Joudi
dep. Information System dep. Information System
King Saud University, KSA King Saud University, KSA
Riyadh, Saudi Arabia Riyadh, Saudi Arabia

Hanan B. Al-Gahtani Maha S. Al-Qahtani


dep. Information System dep. Information System
King Saud University, KSA King Saud University, KSA
Riyadh, Saudi Arabia Riyadh, Saudi Arabia

Abstract—Privacy breaches and Identity Theft cases are Identity theft occurs when someone uses another person‟s
increasing at an alarming rate. Social Networking Sites (SN’s) information for a personal gain or goal. During the past years,
are making it worse. Facebook (FB), Twitter and other SN’s offer online identity theft has been a major problem since it affected
attackers a wide and easily accessible platform. Privacy in the millions of peoples around the world [2]. Victims of identity
Kingdom of Saudi Arabia (KSA) is extremely important due to theft may suffer different types of consequences; for example,
cultural beliefs besides the other typical reasons. In this research they might lose time/money, get sent to jail, get their public
we comprehensively cover Privacy and Identity Theft in SNs image ruined, or have their relationships with friends and
from many aspects; such as, methods of stealing, contributing family broken.
factors, ways to use stolen information, examples and other
aspects. A study on the local community was also conducted. In Today, the majority of SN‟s do not verify normal users‟
the survey, the participants were asked about privacy on SN’s, accounts and have very weak privacy and security policies. In
SN’s privacy policies, and whether they think that SN’s benefits fact, most SN‟s applications default their settings to minimal
outweigh their risks. A social experiment was also conducted on privacy; and hence, SN‟s became an ideal platform for fraud
FB and Twitter to show how fragile the systems are and how easy and abuse. Social Networking services have facilitated identity
it is to gain access to private profiles. Results from the survey are theft and Impersonation attacks for serious as well as naive
scary: 43% of all the accounts are public, 76% of participants do attackers. To make things worse, users are required to provide
not read the policies, and almost 60% believe that the benefits of accurate information to establish an account in Social
SN’s outweigh their risks. Not too far from this, the results of the
Networking sites. Simple monitoring of what users share
experiment show that it is extremely easy to obtain information
from private accounts on FB and Twitter.
online could lead to catastrophic losses, let alone, if such
accounts were hacked.
Keywords—social network; identity theft; fraud; privacy; fake In this article, we shed a light on the benefits and risks
identities associated with SN‟s. We study the local community in the
I. INTRODUCTION Kingdom of Saudi Arabia (KSA) to see if the benefits of SN‟s
outweigh their risks and dangers. Knowing that, Saudi
Computer technology and the Internet have become communities are amongst the most protective in the whole
essential necessities of modern life; they provide knowledge, world, yet have indulged deeply into the usage of SN‟s.
communications, entertainment, and a means for sharing.
Nowadays, almost everyone is connected. People became so A. KSA at a Glance
addicted to this technology to the limit that it is a challenge The Internet first emerged in KSA in 1990. According to
now to stay away for a while. Perhaps the most attractive the latest statistics and global studies presented by many
services for many are those provided by social networks. competent authorities, KSA currently is in the forefront of
international rankings in terms of the use of FB and Twitter.
Social Networks (SN) are Internet based services that allow
There are 100 thousand Tweets per minute and up to 20
people to interact, express and share their ideas and thoughts in
victims of online identity theft. Another study stated that the
multiple formats; such as, text, images, audio, and video.
number of FB users in KSA was 6 million in 2012, and in
Although still young, SN‟s gained large popularity. As of Sep
2013, the number reached 7.8 million, with five million of
2014, Facebook (FB) prides itself for having more than 860
them accessing their accounts from their mobiles [3].
million active users daily [1]. When using SN‟s, different
people share different amounts of their personal information. Recently, many people in KSA from different communities
Having our personal information fully or partially exposed to became outraged by impersonation and hacking attacks that
the public, makes us ideal targets for different types of attacks, targeted their SN accounts. Most of the attacks aim to smear a
the worst of which could be identity theft.

129 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 5, No. 12, 2014

person or ruin his/her image, others tried to disseminate lies L. Bilge et al. [10] examined how easy for an attacker to
and/or mislead the facts to acquire a large number of followers. start automated crawling and identity theft attacks in popular
In 2013, “AlSharq Newspaper” published an article stating that SN‟s sites to access user personal information. They presented
35 persons stole a famous Saudi actor's identity in Twitter. two attacks on victims who have public profiles, one for
Most of them had more than 30 thousand followers, some of registered users, and the other one for non-registered users. The
them reached 50 thousand, and few reached 80 thousand first attack is called automated identity theft. In this attack, the
followers. The fake accounts were used to disseminate lies and authors created cloned victims‟ profiles then sent friend
to publish undesirable images [4]. requests to the contacts of those victims. The second attack is
launched an automated cross-site profile cloning attack. In this
The laws in KSA set penalties for anyone who commits any attack, the attacker can automatically create a forged profile in
cybercrimes; such as, breaking into websites, or accessing the SN‟s where the victim is not registered yet, and reach the
material sent over the Internet. In Saudi laws, a penalty of up to victim‟s friends who registered on both networks. The
three million Saudi Riyal, and imprisonment of up to four years experimental results show that the automated attacks are
for any person who uses unauthorized access to private effective and feasible in practice.
information [5].
M. Al-Mujeb [11] reviews the relationship between KSA
The rest of the paper is organized as follows: In section 2,
culture and privacy and FB privacy risks. The aim was to know
related work is discussed. In section 3, the methodology is how the KSA‟s cultural privacy norms affect the user's
presented. In section 4, a comprehensive study of SN‟s and behavior in FB. She concluded that „‟the most participants who
Identity Theft is provided. In section 5, Results are analyzed. In are active users of FB are not aware of many of the privacy
sections 6 and 7 Recommendations and conclusions are listed risks that can arise from sharing personal information and the
respectively. potential consequences of this, such as stalking, theft and credit
II. RELATED WORK card fraud‟‟. The results confirmed, “The users were unaware
of the privacy setting in FB that can help them to protect their
Previously, several researchers published and highlighted privacy”. The main recommendation of the thesis was
the SN‟s benefits and issues. In the following, few paragraphs encouraging the user to not accept strangers as friends, never
we provide a summary of the most recent related work. share personal information, limit the information available to
M. Reznik [6] published a paper about the identity theft in "Everyone” and read the privacy policy with terms of use.
SN‟s and methods of Internet impersonation. Maksim R. Demyati [12] also talked about FB privacy issues. She
discussed two methods that an offender may use to steal the discussed how FB responded to the privacy concerns and how
identity of victims on the internet: Firstly, when the offender that response affected its users. The main concerns were
creates a fictitious profile of the victim and uses it without the whether FB shares users‟ information with advertisers and who
victim‟s permission, secondly, when the offender gains can see users‟ photos when their friends tag them. As for FB
unauthorized access to the victim‟s account by stealing their response, they agreed to change some of their privacy policy
credentials. and refused to change other parts of it. She concluded with
K. Krombholz et al. [7] reported that one of the main issues some recommendations such as, encouraging the user to read
in social media is information accuracy. Many FB profiles are the privacy policy, keep the sensitive information private, do
fake because many users use false information when they not accept a friend's request from strangers, open a new email
create their profiles. FB rules stated that the user should account for FB and report any issues to FB team.
provide real information to prevent fake identities. The author C. Marcum et al. [13] defined identity theft as a type of
also said that, to prevent users diverting; FB priority should be crime; the high growth of technology has provided new
safety of their users. methods to steal the personal information of thousands of
F. Stutzman and J. Kramer-Duffield [8] provide advice on victims at once. Indeed, the increased number of users on SN‟s
how to enhance the privacy of users in SN‟s. In order to avoid sites, and the relatively weak security and authentication
identity theft, they suggest making users profiles private for procedures have exacerbated this problem. The research also
friends only, which will reduce the information theft risks on suggested that users may not understand the risks associated
SN‟s. with sharing personal information or the potentiality to use this
information to predict highly confidential data like social
A. Verma et al. [9] found that the architecture of security numbers.
centralized SN‟s such as the current ones don‟t ensure the
privacy and security of the users. Therefore, they proposed a J. Mali [14] said that in 2012, there were more than 12
decentralized and distributed architecture that preserves million victims of identity theft in the US. Many financial
privacy and security of the users in online SN‟s. This institutions and companies are enforcing measures to protect
architecture is based on the decentralized SN‟s using “Freedom their customers, but criminals explore new ways to collect
box” as a personal server. It uses Diaspora as a social platform. sensitive data through SN. FB claimed that users data was safe,
Therefore, each user has a Freedom box to store his/her while Twitter suffered an attack were more than 250,000
personal information. They enhanced the privacy and security accounts were affected.
by the use of a cryptographic technique like (Random B. Pragides [15] found that most of identity theft cases
Sequence Algorithm) RSA and digital signatures. occur to the younger generation because they use SN‟s as a

130 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 5, No. 12, 2014

way to communicate with friends and to make new SN‟s provide different benefits based on the way they are
relationships with new people. utilized. The most famous benefits of using social networks in
KSA are:
Up to our knowledge, no one comprehensively studied
SN‟s and identity theft, or provided a field study to measure the  Expand the circle of user contacts and
impact of risks on users‟ usage of such services; and hence, this acquaintances: SN‟s gives users a way to represent
paper is unique as it fully and exclusively covers SN‟s issues in themselves. It allows users to form friendships with
KSA. people from all over the world. Users might find former
friends whom they lost connection with.
III. METHODOLOGY
 Social networks in education: SN‟s can be an
In this research, we focus on finding and analyzing the
excellent tool for education. KSA does not utilize SN‟s
reasons of identity theft in SN‟s and the prevalent plans for
properly for educational purposes yet. Their use in
handling identity theft related to it. We discuss the answers of
education is limited to individual's efforts and the
the following questions: To what extent do users trust SN‟s
existence of official profiles representing the Ministry
sites? Are the SN‟s sites a safe place to share personal
of Education and its leaders.
information? Do the advantages of SN‟s outweigh the risk of
identity theft in KSA?. To find and understand the answers, we  Keep in touch with families and friends: SN‟s allow
did a field study to examine the qualitative outcomes along people to share their daily life in a public way. SN‟s
with the quantitative ones by applying three methods: allow families to share events, images, and videos in
real time. Family and friends can watch and experience
 Data Collection: We collected the data through an
all the things that done individually, and comment on
online survey, which was made available to everyone to
them. They share the experience rather than being
evaluate their knowledge regarding the privacy and
informed over a phone call. Pew Research report by
security policies of SN‟s providers, and to discover
Aaron Smith [18] shows that (67%) of social networks
their concerns regarding such services.
users say that the main reason they sing-up in SN‟s is to
 Data Analysis: After gathering data, we analyzed it to keep in touch with their family members and friends. It
find out what the main reasons for identity theft in SN‟s is a quick way to communicate with relatives and
are, also, to find out the level of personal information friends who lived in other countries. Some SN‟s formed
that the user can share on SN‟s. a bridge linking the members of the displaced families
who were separated by wars and crises of natural
 Social Experiment: In order to test the confidence and disasters.
awareness among users of SN‟s, their knowledge in
terms of their security and privacy, and to measure the  Information gathering and dissemination: SN‟s serve
robustness of FB‟s and Twitter‟s privacy measures, we as news and media platform, where it can provide news
designed a social experiment. In this experiment, we in real time. A user has the agility and mobility at the
cloned public accounts and sent requests to friends of same time to pass information faster than ordinary news
the original account owner who sat their profiles to platform. Many Arabs consider SN‟s a strong
private. competitor to traditional media as SN‟s stay the main
source of news for millions of Arabs. The Arab Social
IV. SOCIAL NETWORKS Media Report 2014 [19] shows that nearly 27.59% of
Nowadays, SN‟s gather millions of people who share news, the Arab respondents from across the Arab world are
images, and other information. It is clear that SN‟s have made getting their news from SN‟s as the main source. The
a significant change in how people communicate and exchange number of users has increased 21.59% since 2013.
information. A SN is an online service provided by a major  Social influence: SN‟s are useful in formulating and
company in order to connect users who share the same interests gathering public opinion, where information can rapidly
or activity, backgrounds or real-life connections. Most of the spread. This impact can vary in its directions. SN‟s
SN‟s are a websites that offer a range of services for their played a big role in the revolutions of the Arab Spring.
users; such as instant messaging, private messages, blogging, SN‟s have all the power to change people opinion, form
file sharing and other services. The most famous SN‟s a protest, or Intercept a public decision. In a report
currently are Facebook, Tumblr, Twitter, and Google+. published by foreign policy magazine [20], discusses
The social clinic statistics confirms that in 2013, the active “The role of new media in moving the masses and more
users of FB in KSA were 7.8 million users, 26% of whom are specifically the role of SN‟s". The writers said, “That
females and 74% males [16]. This indicates an increase of 1.8 SN‟s are playing a significant and possibly crucial role
million users since 2012. Twitter prevalence among Internet in empowering rebels and protesters in ways that
users in KSA is the highest worldwide with a total number of 5 couldn't have been imagined before”. They added,
million active users and 150 million tweets a day. The number “SN‟s may be rebels' favorite weapon, but at the same
of Twitter users has increased 2 million since 2012 [17]. time research on Syria's revolution confirmed that it can
do as much harm as good”.
A. Social Networks’ Benefits
 Finding job opportunities: The role of SN‟s is not
limited to the social side, but also extends to academic

131 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 5, No. 12, 2014

and professional development. Users can search for job people, who share much of their personal information online
opportunities as the government as well as other with strangers.
companies post job opportunities available in their area.
In KSA, users can find specialized profiles that help 1) Methods of stealing information
users find a job that suits their area of interest and SN‟s provide the biggest platform for the misuse of
specialization. personal information, and thus, promote frauds and personal
data extraction. Therefore, it is not recommended to share the
 Advertising and generating income: Companies who national identity number, driver license number and other
have profiles on SN‟s can interact with current events important details on such sites; although, some websites
that interest the public to use them in their next ads. The require this information from the user [23]. However,
release of iPhone 6 is the latest event related to this sometimes they will not ask directly for this personal data, they
benefit. In line with what matters to society at the will search for related sensitive data then use it in several
moment, businesses and competing companies took harmful ways. There are various ways for identity theft to
advantage of the bending problem in iPhone 6 in order happen [24]:
to promote their products. The influence of SN‟s in
advertising can be shown more strongly in small  Data Breach: Organizations and companies store and
business and individuals who work from their home. share all types of sensitive data about their customers.
People, usually, tend to advise their friends to try a "Data breach" occurs when any of this information is
successful product they used which helps the individual lost by mistake or exposed by the neglected employees.
to build a fan base before expanding their business. As  Friendly Fraud: A high percent of identity theft cases
SN‟s help in making products attractive, they also can involve friends and family. Most of the victims are
be the cause of the destruction of product reputation for young adults and college students because they lack
the same reason. There is no room for a compliment enough knowledge and experience. Hackers can use
because SN‟s are the court of the public. SN‟s, sharing sites, and other shared interests to reach
B. Identity Theft In Social Networks those victim's to steal their personal information. For
example, an attacker can know from relative FB page
Identity theft in SN‟s can be done by manipulating people that someone is in a trip to Africa. Depending on this
to get sensitive information or by using posts of the victims on information, he will impersonating his identity and send
SN‟s. While SN‟s promote sharing amongst users, some people an email to all the relatives and friends asking them to
over-share. A large problem with SN‟s is that people tend to send money to him due to its exposure to unforeseen
share a lot of their personal data on their profiles. Such circumstances.
exaggerated sharing makes it easy for identity thieves to do
their job. On the other hand, SN‟s are facilitating the process  Computer Hacking: Cyber criminals are expert at
for attackers to elicit user personal information and use it in breaking into computers or laptops to steal online
illegal behaviors [7]. Identity thefts in SN‟s via social banking logins or other financial information.
engineering are increasing day after day.
 Dumpster Diving: Dumpster diving is used to retrieve
Advertising is one of the main reasons why SN‟s require information by searching through the trash for visible
users' personal information. In order to understand why SN‟s treasures in someone else's trash like access passwords
are free but still encourage users to provide more information, or phone list written down on sticky notes.
we must first understand how these sites make profits by
understanding the advertisements mechanism. Official figures  Skimmers: Devices blend in with ATM machine to
are reporting that 85% of FB profits come from ads [21]. FB collect the credit card information when the card swiped
profit billions of dollars annually by using the content provided through them. Credit card‟s number that was captured
by the users for free. The smart system analyzes all the content by Skimmers‟ devices are used to purchase things in the
shared by the users in order to develop a knowledge base to use name of the owner of the original card.
it for advertising purposes. Every action taken by the user is  Stolen Wallet: Identity theft occurs when the wallet has
used to gather information about him/her; targeted lost or stolen. Criminals look for everything inside that
advertisement will then do the job. Information gathered from wallet like driver‟s license, bank account numbers,
users via various platforms (i.e. computers, smart phones etc.) insurance information, and other sensitive information.
is used to form a social graph for each user. Where the user is
at the center of the graph and connected to all the entities he is  Mail Theft: Stealing mail is an easy way for criminals
related to by edges. Thus, when a user has 900 million social to steal an identity from mailbox or mailbox panels.
schemes, the system searches for similarities between them and They know that the mail may contain approved credit
reclaims general information that benefit in directing offers, loan statements and other information that can
advertising campaigns. FB sells this information to third party use to steal an identity.
companies, which often are advertising agencies to use it in
 Shoulder Surfing: In public places, the best way to see
campaigns they intend to launch [22]. Given the age of users of
all your confidential and personal information is to look
SN‟s, we can identify another reason for the increase in
at the screen you are working on behind your back. You
identity theft in SN‟s. Most users are teenagers and young
never know who is standing behind you or who is

132 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 5, No. 12, 2014

watching you. This method commonly used to obtain attackers rely on stolen images and fake data to the
user passwords, PINs, and similar information. rhythm of their victims from users looking for love and
online romance. Continue to deceive victims with
 Account Takeover: In SN‟s, when users set their stories of love for weeks, and then attackers claim that
accounts to public it would increase their chances of they need money because of exposure to a tragic
being victims. Attackers seek to steal personal accident or injury, and then seek help from the new
information of those victims to create fake accounts. An beloved.
attacker will create a fake account then start to send
friend requests to the victim‟s friends. When a request  Spoofing: It occurs when an attacker impersonates
is accepted, attackers use different techniques to obtain someone's identity where he can steal credit card
sensitive information from them. information and claiming that the client uses it. Another
way of spoofing is when an attacker pretends to be an
 Spam Attack: In SN‟s, attackers know that users spend authorized party or the future party like a bank
more of their time in SN‟s than on emails. So attackers employee or location to get personal information from
send spam through SN‟s by using fake profiles and the client.
spam applications that send spam to the friends of the
victims [25].  Job Posting: One serious technique used in KSA. The
attackers are fake companies or persons who post fake
 Malware: Malware can spread over SN‟s by using tempting jobs. Desperate unemployed people apply and
malicious URL's or by using a fake profile. For their data is extracted and used.
example, attackers can create a fake account by using
the same name of any famous person and ask victims to 2) Factors contributing to identity theft in SN’s
contact them, and then attacker can send malware to the The primary profile data that was used by the attacker in
victims [25]. order to misuse or steal a user‟s identity are full name, date of
birth, hometown, school info, bank accounts, relationship
 Spyware: It is an undesirable program used to collect status and hobbies or interests. Sensitive data can be obtained
secretly and record the activities of a person without through GPS enabled devices, like your home address,
knowledge of that person. This program loaded through workplace and places you visit. There are many reasons to
downloading new programs from an unauthorized keep this information private, because providing this
source. information is potentially dangerous for the user and can put
 Social Engineering: The most commonly prevalent her identity at risk. In KSA, several factors influence and
identity theft process in KSA. It is a way to gain access simplify the theft of user‟s identity on SN‟s. The main ones
to people information without realizing that they are the are:
victim of a security breach using fraud and  Lack of knowledge on how to protect online identity.
impersonation. Social engineering is a successful
process because the victims tend to be good people, are  Lack of knowledge in regulations and cyber laws.
keen to trust others and tend spontaneously to provide
assistance to others. There are many different goals of  The overconfidence in the SN provides.
social engineering hacking, which are fraud, network  The enormous expansion in the number of users along
intrusion, and identity theft disabling systems and with the emerging need of SN‟s is encouraging the
networks, and gain unauthorized access to sensitive organizations to generate profits using those sites.
information. There are several methods of social
engineering hacking. For this paper, we will focus on  Shortage in knowledge and awareness concerning the
how social engineering is implemented using SN‟s. privacy policies given by the SN providers.
Social engineering can be accomplished through the
 Unemployment is a major factor. A large number of
following methods [26]:
unemployed, well-educated youth exist in KSA. Such
 Online Phishing: In online phishing, the attackers try young people have excellent IT and hacking skills, in
to obtain access to the user‟s sensitive data such as addition to all the time they need to perform
banking information by creating a fake website that sophisticated cybercrimes [27].
looks authentic for a specific bank. Then, attackers send Currently, these problems are the main focus but policies
emails and messages to people with a link to a fake and laws are being prepared to resolve these concerns. There
website asking them to login for one reason or another. are different methods and solutions to keep us safe from
 Phone Phishing: It occurs when the victim receives a identity theft. Awareness and knowledge of identity theft and
call on the phone from people saying they work on fraud issues are the easiest and cheapest to implement.
trusted bank or company, and they say that that the 3) Fraud and methods of using stolen data
account need to be updated. In order to update the A statistics report published by Trend Micro states that
account, they ask for sensitive information for KSA was ranked first as the most vulnerable of the gulf
verification.
countries to cybercrimes [27]. Identity theft involves stealing
 Romantic Fraud: This type of fraud frequently occurs and misusing user's identity to gain access to resources or
through SN‟s on the Internet. In romantic fraud, obtaining other profits that are limited to this user. Identity

133 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 5, No. 12, 2014

fraud means the usage of a stolen identity to implement Statistics show that more than 600,000 FB profiles get
criminal behaviors. The problem occurs when a violator has compromised each day, were one of six users said that
information about someone's identity such as name, current someone hacked their account and stole their identity. Four of
addresses, and date of birth etc., in order to defraud his ten users have been victims of cybercrime on SN‟s, and one of
identity. As it can be done for those who are alive, Identity ten users said that they had been a victim to a false link on
theft can be performed on the dead. An Identity thief can do SN‟s. Three of four think that the focusing of cyber criminals is
many things with the information he has, the most common on SN‟s platforms. Considering dangerous behavior on SN‟s,
are: statistics confirms that one of three users do not log out after
sessions. One of five users does not check received links before
 Engage in illegal activities: If a thief gets caught they share it. One of six users has no clue about their privacy
committing a crime using a stolen identity, the settings if they are public or private. Less than half users use a
fingerprints and criminal records will put in the victim's security tool to defend against SN‟s risks and only half of the
name. That will be damaging to the victim‟s reputation users use privacy settings to manage the information they share
where the criminal record may cause the victim to fall and with whom. Regarding social friends, 36% have accepted
in the background checks. friend requests from strangers, and three of ten users received
 Obtain a cell phone account: It happens when identity messages from strangers [30].
thieves create a cell phone account using the victim‟s Communications and Information Technology Commission
stolen information. Then the account charged large bills (CITC) of KSA started a campaign to raise awareness on
on the victim‟s name. cybercrimes, but the campaign focused more on the penalties
 Illegal use of credit card accounts: The most famous for those crimes [31]. The campaign includes an explanation of
identity theft and the most commonly used process. A the most important types of cybercrimes, and the mechanisms
burglar gets the victim‟s credit card or its information to to deal with it. The commission has also established a website
use it as much as they want on the victim's account. called Computer Emergency Response Team [32]. The site
Credit card numbers are usually stolen in bulk from e- goal is to raise the level of awareness, knowledge to detect and
commerce businesses; they get sold later on to gangs to response to information security incident at a national level,
be used. and to be an official reference for information security in KSA.
CITC also established a website called (internet.sa) to be an
 Obtaining bank loans: Frauds can be applied to get Internet service gateway in KSA for services, information and
loans simply by obtaining a victim‟s private data; for statistics references [33]. However, there is no confirmed
e.g. national identity number, address, and work statistics on the number of identity theft victims on SN‟s in
information. Such loans are never paid back, and hence KSA; so we decided to go back to the source.
the victim‟s credit history will be damaged.
We communicated with a number of experts/hackers and
 Spending victim’s checking and saving accounts: sent them our questions that they answered. In KSA, users can
Thieves can use victim‟s personal info to withdraw find famous Twitter profiles that can help them at no cost if
money from his bank account, transfer savings and take they lost their account to a hacker. The person behind the
his investments. profile, usually, had an excellent knowledge in hacking and
phishing. They said that the numbers of victims who
 Get a new ID: The thieves can use personal communicate with them are nearly 150 users a week. From
information of a live or dead person to create a new ID their point of view, their goals in restoring and stealing
in their name, but with their images; such as driver‟s accounts are noble goals as they consider themselves ethical
license. hackers. Some of them will do it if he sees the accounts
 Unauthorized access to utility accounts: Charging interfere with society's values, other do it in order to help the
utilities using the victim‟s identity; such as Internet, oppressed victim. A lot of Arabs celebrities are seeking their
phone, cable, water and electricity utilities. Thieves can help to restore stolen accounts. Where a many of them did not
open utility accounts using someone's identity simply know that they can contact the technical support team to
by extracting little details about the victim. recover their profiles. On the other side, some hackers do it to
sell the accounts, which have large numbers of followers to
 Black market sales: Hackers use underground online someone who's interested in making it an advertising account.
black market for selling the stolen IDs. These black The fact that SN‟s are becoming a potential monetary gain for
markets are frequently used by hackers around the the hackers is a major threat to users. Particularly for users that
world to buy the stolen data, or to sell to other hackers do not have a profile on SN‟s where some people may use their
locally. Hackers have new ways to make money with name in order to make profits by advertising.
SN‟s profiles. Researchers at VeriSign‟s say that stolen
profiles on the FB are now on sale on the black market Nowadays, almost all companies have official profiles on
[28]. Stolen data such as, names, pictures, email SN‟s. Companies‟ representatives manage these profiles. They
addresses, dates of birth are used to create fake profiles. are constantly conscious about what to post to preserve the
For example, photographs of famous people are used to company‟s public image. Unfortunately, those accounts
create false aliases to lure victims [29]. become targets of unhappy former employees, or angry
customers who have an excellent knowledge in hacking. Such
C. Statistics And Examples Of Identity Theft profiles may get hacked. Attackers share posts that may deform

134 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 5, No. 12, 2014

the company image and cause damage to it. Twitter in KSA PayPal profile on Twitter was under attack by unsatisfied
contains many fake profiles that may not be detected until the customer who used Twitter platform to complain about their
victims announce their profiles in a video or trusted media. The service. A quick announcement came from PayPal to comfort
following are examples of the most famous hacked accounts on the customers and to ensure that the attack was only on the
Twitter and FB in KSA and other places. twitter profile [40].
An attacker on Twitter impersonated prince Fahad bin Burger King‟s profile was hacked revealing that the
Khalid, the head of Saudi Arabian football club Al-Ahli. He company was bought by McDonalds. Fortunately, the disaster
published false news about the club boycotting sports channels, turned into a hilarious and successful advertisement for them,
which affected the Saudi sports community until confirmed and they gained over 60,000 new followers [40].
that the person behind the profile is impersonating the prince
[34]. Fox news Twitter account got hacked allegedly by
Anonymous. Think Magazine interviewed a member of the
Another fake Twitter account appeared impersonating Scriptkiddies, he said “Fox News was selected because we
prince Abdulaziz bin Fahad Al Saud, who has a verified profile guessed their security would be just as much of a joke as their
already with the same image and username, but with one reporting." [40].
additional letter. The counterfeiter profile appeared in the same
The Syrian Electronic Army hacked the Guardian‟s Twitter
style and character of the prince and lured many people to
contacting and believing the attacker [35]. profile. They also hacked other major accounts; such as, the
Financial Times, BBC News, CBS News and Associated Press
The official Twitter profile of prince Sultan bin Salman, [40].
head of Saudi commission for Tourism and Antiquities, got
hacked by a hacker group called "Cyber of Emotion”. The A number of sports celebrities also have fallen victim to
hacker group published several tweets critical of the tourism their personalities impersonators on FB, including Mohamed
sector. The commission affirmed in a statement that the profile Aboutrika, Al-Ahly player. The number of pages that bears his
was hacked and announced that they welcome the constructive name on FB reached 138 pages. The irony is that Aboutrika
criticism and suggestions from the public. The commission does not know anything about these pages, which publishes
confirmed that they would answer with transparency and news about him as if he was the page owner [41].
clarity all the questions that were published by the hacker Ahmed Fathy Al-Ahly and Egypt national team player was
group. They were able to recover the profile by help from surprised by a page bearing his name on FB that was
Twitter technical support [36]. publishing false news about him such as leaving Al-Ahly club
The same hacker group "Cyber of Emotion" was able to and receiving a professionalism from European teams. Fathy
hack the official profile of the principality of Al-Madinah in said, "I do not have any page on FB or any other website, there
Twitter. They launched an attack on the center of information are people who pass themselves off as me by using my name
technology, pointing out that the laws do not have validity and and picture. He added: “The number of pages by my name are
do not apply to reality. In a letter addressed to the principality more than 20 pages and unfortunately, I discovered that a lot of
of Al-Madinah they said "Dear principality of Al-Madinah: Al-Ahly fans were a victim of those pages and followed them
The hacked of this profile is not for personal reasons, but thinking it is my personal page. He concluded: “Many of my
because of the lack of attention to the errors sent from friends in the team suffer from the same issues, which caused
us…Cyber of Emotion." They added, "It appears that the hack them a lot of problems because some statements falsely
is unique today, but we wanted to deliver a message that our attributed to them on these pages” [41].
existence lies not only in the sites, but here as well?”. They It is noteworthy that some of the official governmental
concluded: “We apologize for the intrusion of the principality profiles had witnessed a penetration by "hackers". Attacks
of Al-Madinah, we wish that we will not to be punished for aimed at abusing and sending messages in the wrong and
what we did because if it fell in the hands of someone else it controversial ways.
will become a bigger scandal…Thank you" [37].
From all of the above examples, we see that identity theft is
The official profile of the Saudi Ministry of Justice was a serious problem with severe consequences. Solving this
exposed to breach by an anonymous; in his Tweets he asked problem requires intensifying the efforts to follow up on the
some questions and requested answers. The attacker said that offenders and imposing heavy fines and punishment on them.
he broke into the account because of the ministry‟s negligence SN‟s are now considered a source of information for celebrity
in information security. He also wondered about the cost of news and official government agencies. Thus, their SN‟s
developing the ministry's website [38]. profiles must be verified to prevent the spread of rumors and
The official Twitter profile of prince Faisal bin Turki, the confusion between people.
head of Al-Nassr football club was hacked by a club fan hacker D. Security Settings In Social Networks
called himself “King Bender”. He tweets using the prince Since SN being used by millions of people as a
profile to thanking him and congratulating the team for communication platform, a lot of Information is associated
winning the league. The same hacker hacked the official with users' posts such as physical locations, users‟ preferences,
Twitter profile of prince Abdulrahman Bin Mosaad Chief of and social relationships. These sites are the fastest and simplest
Al-Hilal Football Club which is considered the rival team of way to find users‟ personal information. In particular, the users
Al-Nassr [39]. of Twitter and FB should be concerned about what personal

135 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 5, No. 12, 2014

information they share in their profiles and how others may use make sure of the user‟s identity. Various techniques can be
it. Thus, before users sign up in SN‟s, they need to think about used such as biometric systems e.g. fingerprint and iris
how these websites protect them and if they can trust them to recognition. These systems send the biometric data from the
share their personal information [42, 43]. The next part is an input devices to a central processing unit to identify the identity
explanation of security settings in Twitter and FB. of the users. By using this system, we can guarantee a person‟s
validity and prevent crimes before they occur [55].
1) Security settings in Twitter
 Login Verification: Is off by default. It is so important The problem emerged in the SN‟s because we cannot use
because it make it harder for an unauthorized person to known identity verification systems; such as, fingerprint and
login to the user account by receiving a confirmation eye print over the Internet. Moreover, it will not be a clever
login request via a text message [44, 45]. move and unsafe for the users. Therefore, the need for other
verification systems became a necessity to make sure of users‟
 Password Reset: It is off by default, and it requires identities, and that they are real and not fake.
user email or phone number to reset his/her password
[44]. In SN‟s, there is a difference between confirmation and
verification. Confirmation is a message sent by the site to the
 Apps: A record of all the applications that have access user's email or phone to secure the account. For identity
to Twitter account. Users can revoke access for confirmation, some sites used phone numbers or emails, and
unwanted applications [46]. others gather the two methods to ensure greater security for the
 Tweet Location: It is off by default but when a user is users. Still the problem of personality verification and the
tweeting from his/her phone the default setting is on possibility of identity theft exist. While verification is an
affirmation from the same site that the owner of this account is
[47].
the same person who uses it.
2) Security settings in FB Twitter and FB offer two services verification and
 Login Notification: It is off by default. This feature confirmation. In confirmation, they use two methods to
notifies the user by email or text messages if his/her confirm the account by email and phone. However, verification
account was accessed from a device they have not used is limited to public figures, celebrities, and governments.
earlier [48]. Verified accounts are the accounts that have a blue badge with
 Login Approvals: It is off by default. It requests from a checkmark, meaning that they are real celebrities or brand
the user to enter a code that was sent to his/her phone profiles and not fake [1, 56].
when he/she accesses his account from a device that
V. RESULTS AND ANALYSIS
they never used before. After logging in, the user can
save this new device into his/her account list of trusted We conducted an online survey to study the user's
devices [49]. knowledge about the importance of privacy and security in
SN‟s and the extent of user awareness of these issues in the
 Code Generator: It is part of login approvals, and it is KSA. The purpose of this survey is to evaluate the amount of
creating a security code every 30 seconds even when information that the users typically disclose and what the
users are not connected to the Internet. In addition, the privacy settings they have applied to protect their profiles. The
user can use it when resetting his password [50]. survey consists of 18 questions with a total of 510 users,
 App Passwords: It is off by default. A one-time representing ten cities “Fig. 1”, participated in the survey.
password the user can use to login to his FB App and it
helps to keep FB‟s original password safe [51]. 70
58.1
60
 Trusted Contacts: User closest friends who can
50
securely help him if he/she has a problem to access
his/her account. For example, when a user forgets his 40
password, and he cannot access his email to reset it 30 21.3
18.3
[52]. 20

 Logged-in List: Shows a list of all user browsers and 10 0.2 0.2 0.9 0.2 0.2 0.2 0.2
devices that were used to login to a user‟s account 0
recently [53].
 One-time Password: It is password used when a user is
not comfortable to enter his real password. For
example, when the user is in a public place [54]. Fig. 1. Survey total responses rank by cities
E. Identity Verification Systems in SN’s
A. Survey Summary
Fake profiles mostly post misleading information, images
and other data that eventually deforms the victim‟s public When analyzing quantitative and qualitative data, the
image. Therefore, to reduce the possibilities of theft and fraud, highest response rate (57.1%) was from female participants,
appropriate identification systems must be used in order to

136 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 5, No. 12, 2014

where the lowest response rate (42.9%) was from male We have found that older people do not want to join SN‟s
participants “Fig. 3”. because of the negative sides. They consider it an unsafe
environment for them. The youngsters do not have fully
Most respondents aged between 16-20 years old (23.3%), understood the value and the importance of privacy in order to
where least respondents aged between 41-45 years old (4.7%) maintain their personal information. On the contrary, they
“Fig. 2”. share every single detail about their personal life on their
For the level of education, "Fig. 4" shows that the most accounts.
participants were undergraduate users (38.2%) and Twitter was Therefore, it becomes a cause of defaming to them, and it
the most popular SN‟s among them. The youngest participants can be used against them in the future. The majority of
were elementary school users (4.9%), and Instagram was the concerned parents do not want their children to join this
most popular SN‟s among this group “Fig. 16”. community because their children are not aware of the meaning
(58.6%) Of participants agreed that the benefits of SN‟s of security policies and related problems, and they are not old
sites overcomes their risks, therefore (58.3%) of the users want enough to make smart decisions. For that, children and many
to verify their profiles on SN‟s “Fig. 5”. adults need to be given enough knowledge and awareness
about to protect their information and how to maintain their
(6.5%) Of participants does not use SN‟s “Fig. 6”, and the privacy.
reason that prevent (48.5%) of them from signing-up is the
social and religious irregularities, especially that there are a lot 2) People who do have a SN’s account
of inappropriate profiles on SN‟s. The second reason is that the The ages of the participants are diverse, were the most
SN‟s are not safe in terms of privacy (42.4%) “Fig. 7”. interactive group that is involved in SN‟s are teenagers and
young adults. While 58% of the participants believed that, the
On the other hand, (93.5%) Of all participants sign-up in benefits of SN‟s overcome its risks. Teenagers and young
SN‟s. (63.9%) of them selected “to communicate with their adults were the two groups that thought otherwise “Fig. 17”.
friends and families”, and (57.7%) of them selected “finding The majority of the two groups believed that the risks outweigh
useful information” “Fig. 8”. the benefits, yet they have active accounts!. The number of
We found that participants prefer to write a partial of their female participants outnumbered males by a narrow margin
real name on their profiles (45.5%) rather than their real name estimated by (14%) this could be evidence that the female
(32.5%) or a fake name (22.0%) “Fig. 9”. users are more than the male users.
(43.6%) Of participants choose to make their profiles The results show that the most used SN‟s in KSA
public to everyone and only (31.7%) prefer to make their respectively are Twitter, Instagram, YouTube, FB, and
profiles private to friends and family “Fig. 10”. Google+. The most famous SN‟s among elementary and
middle school are Instagram. Where for high school,
The three prevalent information users like to share are undergraduate and higher education was Twitter “Fig. 16”. As
email (64.2%), city (38.4%), and phone number (34.8%) “Fig. for the benefits of SN‟s, the most purposes were to entertain
11”. and spend time, Find useful information and of course
The result shows that (55.4%) of users only trust the news communicate with friends and family. The teenagers score the
and information from verified profiles while (26.4%) of users highest rate in selecting a fake name or a part of their real name
always check the well-known newspapers. In contrast, (18.2%) and as the age increased, the participants choose to write their
of users believe any news and information on SN‟s “Fig. 12”. full real names.
(50.1%) Of participants are confident in their level of The level of privacy was different according to the level of
privacy, in return (49.9%) of participants are not sure in their the education. Where the school students and undergraduate
level of privacy on SN‟s “Fig. 13”. were more open in their privacy as they choose their profile to
be public. While some of them did not care about their privacy
(75.5%) Of users does not read the privacy policy "Fig. 14" setting or they did not know about it. In return, the higher
and maybe that is the reason that (78.0%) of them do not trust education participants were more conservative about their
the SN‟s provider in protecting their personal information “Fig. privacy were they want their profiles to be private to their
15”. friends and families “Fig. 18”.
B. Discussion As a result, the users who made their profiles public were
1) People who do not own a SN’s account the majority of participants who wanted to verify their profiles.
33 Of the participants do not have accounts in SN‟s; most While the users who made their profiles private were the
of them are older than 46 years old. The main reasons for this minority of participants who wanted to verify their profiles.
are they found that this site's overwhelmed with social and The ranking of information that user share the most on
religious irregularities and it is not safe in terms of privacy and SN‟s are email, city, phone number, interests, images, local
security. They see the bad side of SN‟s where there is a lot of location, education and marital Status. 15.5% of the
impersonation, rumors and harassment. Some of them find it participants share nothing on their profiles, and those are the
hard to fully understand it and get used to it. Other simply does ones who did not care about verifying their profiles as they did
not find the time or the interest to be a part of such sites. not find the need to protect their information.

137 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 5, No. 12, 2014

Teenagers are the most confident in the news and and send friend requests to friends who have private profiles.
information published in SN‟s sites. Compared to the rest of Of course, if the friend accepted our request we were able to
the participants where they do not trust any information except see all of their information. (all the people whom we contacted
for news published by the government verified profiles. were sent an apology message and their data was not touched).
However, in many cases, these accounts may be exposed to We did the same experiment on two public accounts, one from
theft and could release a lot of misleading information before FB and the other one from Twitter.
they are restored.
On FB, we chose a public account X for our experiment.
The biggest problem in misunderstanding how SN‟s deals Then we created a new account with the same name. In
with the user's information is related to the fact that the users addition, we changed the profile picture and the cover photo to
never read the privacy policy. A lot of users don‟t know how the same real ones. After that, we sent requests to private
SN‟s providers protect or deal with their personal information. accounts on X‟s list. The total requests we sent were 121, and
Therefore, the majority of the participants do not trust SN the responses we got were 30, about 24.8%.
providers.
On Twitter, we choose a public account Y for our
C. The Social Experiment experiment. Then we created a new account with a username
1) Twitter social experiment Y‟ (close to the real username), and we changed the profile
For our first Twitter experiment, we tested the user's picture to the same real one. After that, we requested to follow
knowledge and awareness in security. We sent them a link in the private accounts. The total requests we sent were 75, and
the broadcast as follows “To verify your account on twitter the responses we got were 33, about 44%.
quickly just click on the following link.”. In the first page, we The results from this experiment show us many points:
asked the participant to enter his/her Twitter username. When
they did that, we directed them to page number two where we  Public profiles can be cloned very easily on both
described our experiment to them. networks.
The plan is to see how people's confidence in the  Public profiles can endanger private ones on their lists.
anonymous links that asks them to provide private information.  Friends of the victims show high acceptance for any
Even though we asked them to give something general "their incoming messages from their fake friends.
username", but it also shows how much they are willing to give
their personal information to anonymous and strangers. 198  We noticed that FB provides more settings to protect
(43%) users entered the link; only 55 (28%) users did not enter privacy and security.
their usernames “Fig. 19”.
 Twitter users are more willing to accept such requests
143 (72%) users entered their Twitter usernames. The than those of FB.
reason behind that from their point of view is they consider
usernames public and okay to share. However, this information Of course, we realize that the numbers we got cannot be
is considered valuable to a stranger who does not know that the trusted because of the small sample, however, for ethical
respondent own a Twitter account in the first place. Through reasons, we didn‟t want to involve more accounts in this
this information, he can monitor them, know their information, experiment; as the goal was just to prove the simplicity of the
and even hack their profile or email to gain access to more process.
info. VI. WAYS TO PROTECT OUR PRIVACY
TABLE I. TWITTER LINK Based on this research, we would like to emphasize that
Elementary Middle High Undergraduate Higher
privacy is a personal responsibility; users should not give their
School School Education personal information to others who may use it illegally. SN‟s
continually change their privacy policies to protect themselves
Users who 14 50 55 55 24 and to put the blame on the users. According to [57, 58] users
enter the can do the following:
link
Users who 9 46 40 35 13  Secure your PC against theft: Activate hard disk
enter the encryption and always use password to access your
username devices. Continually update your operating system,
Users who 5 4 15 20 11 security packages and review your web browser
didn‟t security setting. Install and automatically update your
enter the
username
anti-virus, anti-spyware software, use an anti-phishing
tool and use a good firewall.
*Check the appendix  Be careful when you share your personal data:
2) FB and Twitter experiment Always be careful from sharing your personal
In this experiment, we wanted to prove that penetrating information with strangers and trusted people especially
private accounts are very easy. The idea is to select a public over unsecure medium.
profile at random, clone the profile with a similar username,

138 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 5, No. 12, 2014

 Do not use one email: Create a new email for SN‟s  Very job posters: Before sending an application online
sites and do not use your personal email or work email or in person make sure that the advertiser is a real
to sign-up in SN‟s. Do not share your email that you company.
used to register with anyone and do not choose a
username that is similar to your email. If a hacker  Make your profile “private”: Based on our
knows your email, you will facilitate his job to send you experiment, users should always choose to have private
viruses or suspicious links. Make sure to use a real accounts and be cautious when accepting requesting
email in your profile to help you to restore your from friends who have public accounts.
forgotten password. VII. CONCLUSION AND FUTURE WORK
 Create a difficult password: Users should not use There is no doubt that SN‟s provide a wide range of
weak passwords in personal accounts or use the names benefits to users. However, those benefits are not free of risks.
of relatives as password as that can be easily guessed. Sever risks of privacy breaches and Identity Theft exist. The
Many software systems can identify weak passwords. facts that the majority of users are not aware of such risks and
Moreover, users should not use the same password for that SN‟s providers lack proper protection and verification
all accounts. When creating a new password, use letters, methods make the situation much worse.
numbers, and symbols at the beginning or end of the
password. Users should always try to make the In KSA, our survey revealed many results some of which
password at least 12 characters to make it unpredictable are scary. The fact that 59% of all participants believe that the
and impenetrable and never use a password similar to benefits of SN‟s outweigh their risks justifies the
email or username. unprecedented increase in the number of users. When 43% of
all profiles are public, and 25% are not aware of their privacy
 Be careful with any message from a stranger: settings, the privacy of those with private profiles is at a great
Attackers use different methods and different messages risk. Other results show that the majority never read privacy
to reach their victims. A careful user will always very policies, share too much information, and have confidence in
the sender before responding to such messages. SN‟s.
 Be careful of phishing: Phishing is an attempt to gain Our small experiment revealed that users with private
sensitive data such as username, password, and credit accounts are not safe as well. Having friends with public
card information by impersonating trustworthy pages in profiles endangers the privacy of such users.
SN‟s. Some hacker will pretend to help you in
promoting your profile by increases your followers or To improve the privacy of users and reduce their risks we
protecting your account. Where he will ask you to enter provided a list of recommendations. We believe that awareness
your personal data in false sites that look similar to the at a national level is needed, users need to be cautious and
real one. To protect against phishing attacks, experts protective, and SN providers need to provide more protection
advise paying attention not opening a suspicious link and verification to users.
and ignoring any request to enter your personal data Finally, from the experiment‟s results we found that
unless it is the official site otherwise your account will Facebook provides more privacy and security tools than
be at risk. Furthermore, make sure access to the Twitter. Therefore, identity theft cases on Facebook appear to
company's website via the correct URL. be less than that of Twitter.
 Stay away from Spyware: Programs and messages REFERENCES
reach many users through SN‟s and email. Some [1] Facebook, (2014). Facebook Website. [online] Available at:
messages contain wrong information aimed at stealing http://www.Facebook.com [Accessed 12 Sep. 2014].
user‟s data. Users need to ignore messages that they do [2] K. Finklea, 'Identity Theft: Trends and Issues', Congressional Research
not know their origin and update antivirus software Service, 2014.
regularly. [3] S. Gazette, 'Use of mobiles in social media on the rise in KSA',
Saudigazette.com.sa, 2014. [Online]. Available:
 Do not download unknown software for portable http://www.saudigazette.com.sa/index.cfm?method=home.regcon&cont
entid=20140109192016. [Accessed: 17- Apr- 2014].
devices: Many users download unknown software or
games to their devices. Many of such applications can [4] S. Al-Qahtani, 'hackers loot 2.5 billion SR from 3.5 million Saudi in one
year', Alsharq.net.sa, 2013. [Online]. Available:
be malware. Users need to be careful when loading any http://www.alsharq.net.sa/lite-post?id=792821. [Accessed: 24- Nov-
unknown programs; they should only deal with 2013].
websites of the reliable and well-known companies. [5] Citc.gov.sa, 'Anti-Cyber Crime Law', 2007. [Online]. Available:
http://www.citc.gov.sa/English/rulesandsystems/citcsyste/pages/cybercri
 Avoid the use of public computers or networks: mesact.aspx. [Accessed: 17- Nov- 2014].
Many risks exist when using public devices or network, [6] Reznik, Maksim (2013) "Identity Theft on Social Networking Sites:
such as in libraries, cafes and airports, especially when Developing Issues of Internet Impersonation," Touro Law Review: Vol.
reviewing the financial statements. If necessary, use it 29: No. 2, Article 12.
but be careful, you must delete personal files, cookies, [7] K. Krombholz, D. Merkl and E. Weippl, 'Fake identities in social media:
and Internet history and never forget to log out. A case study on the sustainability of the Facebook business model',
Journal of Service Science Research, vol. 4, no. 2, 2012.

139 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 5, No. 12, 2014

[8] Stutzman, Fred and Kramer-Duffield, Jacob (2010): Friends only: [28] RIVA RICHMOND (May 2, 2010) 'Stolen Facebook Accounts for Sale',
examining a privacy-enhancing behavior in Facebook. In: Proceedings The New York Times, pp. B3 [Online]. Available at:
of ACM CHI 2010 Conference on Human Factors in Computing http://www.nytimes.com/2010/05/03/technology/internet/03facebook.ht
Systems 2010. pp. 1553-1562. ml?_r=0 (Accessed: 15 Nov 2014).
[9] Verma, D. Kshirsagar and S. Khan, 'Privacy and Security: Online Social [29] Hayley Dixon (2013) 'Online dating sites use stolen data to create fake
Networking', Association of Computer Communication Education for profiles, it is alleged', The Telegraph, 29 Jul , p.
National Triumph (ACCENT), vol. 3, no. 8, pp. 310-315, 2013. http://www.telegraph.co.uk/news/uknews/law-and-
[10] L. Bilge, T. Strufe, D. Balzarotti and E. Kirda, 'All your contacts are order/10207712/Online-dating-sites-use-stolen-data-to-create-fake-
belong to us: automated identity theft attacks on social networks', pp. profiles-it-is-alleged.html.
551--560, 2009. [30] 2012 NORTON CYBERCRIME REPORT, 1st ed. 2012.
[11] M. Al-Mujeb. “Saudis‟ Awareness of Privacy Risks on Facebook” M.A. [31] Citc.gov.sa, 'Cybercrime Awareness Campaign', 2014. [Online].
thesis, University of Glamorgan, UK, 2010. Available:
[12] R. Demyati. “Privacy Issues in Facebook” M.A. thesis, Australia, 2011. http://citc.gov.sa/arabic/MediaCenter/awarenesscampaigns/Pages/PR_A
WR_005.aspx. [Accessed: 08- Oct- 2014].
[13] Catherine D. Marcum, George E. Higgins (April 28, 2014 by CRC
Press) Social Networking as a Criminal Enterprise. Criminal Justice & [32] Cert.gov.sa, 'Computer Emergency Response Team - SA', 2014.
Law [Online]. Available at: [Online]. Available:
http://www.crcpress.com/product/isbn/9781466589797 (Accessed: 31 http://www.cert.gov.sa/index.php?option=com_frontpage&Itemid=1.
Nov 2014). [Accessed: 07- Oct- 2014].
[14] Joy Mali (2012) Identity Theft Through Social Networking, Available [33] Internet.sa, 'Reporting on impersonation (Twitter) | Internet Saudia
at: http://www.lifehack.org/articles/technology/identity-theft-through- Arabia', Web1.internet.sa. [Online]. Available:
social-networking-lessons-take-now.html (Accessed: 14 Nov 2014). http://web1.internet.sa/ar/twitter_impersonate/. [Accessed: 08- Oct-
2014].
[15] Bernard Pragides, Identity Theft and Social Networking Sites, Available
at: [34] Alarabiya.net, 'Hacker penetrates the head of Al-Ahli club account in
http://www.streetdirectory.com/travel_guide/140727/identity_theft/ident Twitter', 2014. [Online]. Available:
ity_theft_and_social_networking_sites.html (Accessed: 14 Nov 2014). http://www.alarabiya.net/articles/2012/04/15/207846.html. [Accessed:
10- Nov- 2014].
[16] T. Clinic Team, 'The State of Social Media in Saudi Arabia 2012', The
Social Clinic, 2013. [Online]. Available: [35] H. Lhd, 'User impersonating Prince Abdulaziz bin Fahd on Twitter',
http://www.thesocialclinic.com/the-state-of-social-media-in-saudi- Burnews.com, 2014. [Online]. Available:
arabia-2012-2/. [Accessed: 08- May- 2014]. http://www.burnews.com/news/2012/03/31/‫مغرد‬-‫ينتحل‬-‫شخصية‬-‫االمير‬-‫عبذ‬-
‫العزيز‬-‫بن‬-‫فهذ‬-‫على‬-‫التويتر‬. [Accessed: 14- Nov- 2014].
[17] T. Clinic Team, 'The State of Social Media in Saudi Arabia 2013', The
Social Clinic, 2014. [Online]. Available: [36] M. Houdad, '"Tourism" undertakes to answer the questions from the
http://www.thesocialclinic.com/the-state-of-social-media-in-saudi- hacker', Sabq.org, 2014. [Online]. Available: http://sabq.org/kvngde.
arabia-2013/. [Accessed: 08- May- 2014]. [Accessed: 14- Nov- 2014].
[18] Smith, 'Why Americans use social media', Pew Research Center's [37] K. Ashamany, '"Hacker" penetrates the profile of principality of Al-
Internet & American Life Project, 2011. [Online]. Available: Madinah in "Twitter"', Sabq.org, 2014. [Online]. Available:
http://www.pewinternet.org/2011/11/15/why-americans-use-social- http://sabq.org/Ijngde. [Accessed: 17- Nov- 2014].
media/#fn-250-1. [Accessed: 14- Jun- 2014]. [38] Y. Al-Otaibi, '"Hacker" penetrates Ministry of Justice account in
[19] Mbrsg.ae, 'Arab Social Media Outlook 2014', 2014. [Online]. Available: "Twitter"', Sabq.org, 2014. [Online]. Available: http://sabq.org/W1ngde.
http://www.mbrsg.ae/HOME/PUBLICATIONS/White-Papers-(1)/Arab- [Accessed: 17- Nov- 2014].
Social-Media-Outlook-2014.aspx?lang=en-US. [Accessed: 23- Oct- [39] Barqawi, 'For thanking and challenging', Sabq.org, 2014. [Online].
2014]. Available: http://sabq.org/u7Xfde. [Accessed: 17- Nov- 2014].
[20] S. HIMELFARB and S. ADAY, 'Media That Moves Millions', Foreign [40] J. Philips, '7 Examples of What Happens When Your Twitter Account is
Policy, 2014. [Online]. Available: Hacked - Jeffbullas's Blog', Jeffbullas's Blog, 2013. [Online]. Available:
http://www.foreignpolicy.com/articles/2014/01/17/media_moves_millio http://www.jeffbullas.com/2013/07/12/7-examples-of-what-happens-
ns_social_ukraine_twitter. [Accessed: 07- Nov- 2014]. when-your-twitter-account-is-hacked/. [Accessed: 03- Oct- 2014].
[21] M. Habash, 'How does Facebook achieve its profits?', Tech-wd, 2012. [41] H. Zayed, 'Impersonation on "Facebook"', Alwatan.com.sa, 2014.
[Online]. Available: http://www.tech-wd.com/wd/2012/05/28/how-can- [Online]. Available:
facebook-make-money/. [Accessed: 28- Oct- 2014]. http://www.alwatan.com.sa/Nation/News_Detail.aspx?ArticleID=76105.
[22] Facebook.com, 'Under the Hood: The Entities Graph', 2014. [Online]. [Accessed: 17- Nov- 2014].
Available: https://www.facebook.com/notes/facebook- [42] Beach, M. Gartrell and R. Han, 'Solutions to security and privacy issues
engineering/under-the-hood-the-entities-graph/10151490531588920. in mobile social networking', vol. 4, pp. 1036--1042, 2009.
[Accessed: 13- Nov- 2014]. [43] Securityinabox.org, 'How to Change Basic Account Settings on Twitter |
[23] Carrns, 'Careless Social Media Use May Raise Risk of Identity Fraud', Security In A Box', 2014. [Online]. Available:
Bucks Blog, 2012. [Online]. Available: https://securityinabox.org/twitter_basic#2.1. [Accessed: 20- Sep- 2014].
http://bucks.blogs.nytimes.com/2012/02/29/careless-social-media-use- [44] L. Hardwick, 'How to improve your Twitter security and privacy', Naked
may-raise-risk-of-identity- Security, 2014. [Online]. Available:
fraud/?_php=true&_type=blogs&_php=true&_type=blogs&_r=1. http://nakedsecurity.sophos.com/2014/08/26/how-to-improve-your-
[Accessed: 14- May- 2014]. twitter-security-and-privacy/. [Accessed: 26- Sep- 2014].
[24] INVISUS (2014) How Identity Theft Happens, Available at: [45] Support.twitter.com, 'Twitter Help Center | Using login verification',
http://www.idefendfamily.com/how_id_theft_happens.aspx#cs 2014. [Online]. Available: https://support.twitter.com/articles/20170388.
(Accessed: 15 Nov 2014). [Accessed: 24- Sep- 2014].
[25] Gunatilaka, A Survey of Privacy and Security Issues in Social Networks, [46] Support.twitter.com, 'Twitter Help Center | Connecting or revoking
1st ed. 2011. third-party applications'. [Online]. Available:
[26] Carnegie Mellon University (2014) How cyber criminals operate, https://support.twitter.com/groups/57-safety-security/topics/276-
Available at: understand-your-settings/articles/76052-connecting-or-revoking-third-
http://www.carnegiecyberacademy.com/facultyPages/cyberCrime.html party-applications. [Accessed: 08- Oct- 2014].
(Accessed: 15 Nov 2014). [47] Support.twitter.com, 'Twitter Help Center | Adding your location to a
[27] P. DWYER, CYBER CRIME IN THE MIDDLE EAST, 1st ed. 2010. Tweet'. [Online]. Available: https://support.twitter.com/groups/57-

140 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 5, No. 12, 2014

safety-security/topics/276-understand-your-settings/articles/122236-
adding-your- location-to-a-tweet. [Accessed: 08- Oct- 2014].
[48] Facebook.com, 'What are login notifications? | Facebook Help Center |
Facebook'. [Online]. Available:
https://www.facebook.com/help/162968940433354. [Accessed: 08- Oct-
2014].
[49] Facebook.com, 'What are login approvals? How do I turn this setting
on? | Facebook Help Center | Facebook'. [Online]. Available:
https://www.facebook.com/help/148233965247823. [Accessed: 26- Sep-
2014].
[50] Facebook.com, 'What is Code Generator? How does it work? | Facebook
Help Center | Facebook'. [Online]. Available: Fig. 3. Participants gender
https://www.facebook.com/help/270942386330392. [Accessed: 27- Sep-
2014].
[51] Facebook.com, 'How do I use app passwords? | Facebook Help Center |
Facebook'. [Online]. Available:
https://www.facebook.com/help/249378535085386/. [Accessed: 28-
Sep- 2014].
[52] Facebook.com, 'What are trusted contacts? How do I add trusted
contacts to my account? | Facebook Help Center | Facebook'. [Online].
Available: https://www.facebook.com/help/119897751441086.
[Accessed: 28- Sep- 2014].
[53] Facebook.com, 'How can I manage where I'm logged into Facebook? |
Facebook Help Center | Facebook'. [Online]. Available:
https://www.facebook.com/help/174571515935086. [Accessed: 08- Oct- Fig. 4. Participants level of education
2014].
[54] Facebook.com, 'What's a one-time password and how do I get one? |
Facebook Help Center | Facebook'. [Online]. Available:
https://www.facebook.com/help/214309978590084. [Accessed: 08- Oct-
2014].
[55] Identity Verification and Social Information, 1st ed. Trulioo Information
Services Inc.
[56] Twitter.com, 'Welcome to Twitter'. [Online]. Available:
http://www.twitter.com. [Accessed: 08- May- 2014].
[57] Rami, 'How to protect yourself from identity theft online', Alwakei.com,
2012. [Online]. Available:
http://www.alwakei.com/news/19894/index.html. [Accessed: 02- Apr- Fig. 5. Does the benefits of SN‟s overcomes the risks
2014].
[58] K. Williams, A. Boyd, S. Densten, R. Chin, D. Diamond and C.
Morgenthaler, 'Social Networking Privacy Behaviors and Risks',
Seidenberg School of CSIS, Pace University, USA, 2009.
[59] M. Whitman and H. Mattord, Principles of information security, 1st ed.
Boston, Mass.: Thomson Course Technology, 2003.

APPENDIX

Fig. 6. Participants Profiles in SN‟s

Fig. 2. Participants ages

Fig. 7. Reasons that prevent users from participating in SN‟s

141 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 5, No. 12, 2014

Fig. 8. Reasons participants use SN‟s Fig. 13. User‟s confidence in their level of privacy

Fig. 9. User‟s name on SN‟s Fig. 14. Do you read the privacy policy

Fig. 10. The level of privacy Fig. 15. Users confidence in SN‟s providers

Fig. 11. Information users share on SN‟s

Fig. 16. Relation between users ages and most used SN‟s

Fig. 12. Users confidence in news on SN‟s

142 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 5, No. 12, 2014

TABLE II. FBAND TWITTER EXPERIMENTS

Required Information to Sing-up

Facebook Twitter

First Name √ √
Family Name √ √
Email Address √ √
Date of Breath √ X
Gender √ X
Not required at first
but later when the
user sign-in he/she
Phone Number √
Fig. 17. Relation between users' level of education and benefits overcomes the must enter a phone
risks number to confirm
the account.
In order to complete In order to
the registration FB complete the
Confirmation
send a confirmation registration
Message
message to the user Twitter send a PIN
email. to the user phone.

Privacy and Security

Facebook and Twitter provide a lot of privacy options to protect


user security where the user can customize it as needed.

Experiment Results

The total requests


Number of Friend The total requests
Fig. 18. Relation between users education and level of their profiles privacy we sent are 121, and
we sent are 75, and
Requests Sent By the total responses
the total responses
Us we get are 30.
we get are 33.
Percentage 24.8% 44%
The results of the experiment confirmed that Facebook provides
more options for privacy and security than Twitter. Therefore,
identity theft cases on Facebook appears to be less than that on
Twitter.

Fig. 19. First Twitter experiments

143 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 5, No. 12, 2014

Software Architecture Reconstruction Method, a


Survey
Zainab Nayyar Nazish Rafique
Department of computer Engineering EME, National Department of Computer Engineering, College of EME,
University of Science and Technology (NUST), H-12, National University of Science and Technology (NUST),H-
Islamabad, Pakistan 12,Islamabad, Pakistan

Abstract—Architecture reconstruction belongs to a reverse architectural view of the system hides details of
engineering process, in which we move from code to architecture implementation, data representation and algorithms and only
level for reconstructing architecture. Software architectures are concentrates on developing a link between requirements and
the blue prints of projects which depict the external overview of implementation. The software architecture depicts actually the
the software system. Mostly maintenance and testing cause the tangible entities of a system and relationship between those
software to deviate from its original architecture, because entities. The role of software architecture in developing the
sometimes for enhancing the functionality of a system the software is to understand, reusability, construction, evolution,
software deviates from its documented specifications, some new analysis and management of a system [3].
modules are included in the system without modifying the
architecture of a system which create issues while reconstructing However, only few organizations participate in software
the system, as much as the software is closed to the architecture architecture reconstruction efforts. Architecture of software
the more it is easy to maintain and change the document so the systems plays a significant role in attaining specific business
conformance of architecture with the product is checked by goals. Therefore it is very important to understand the
applying the reverse engineering method. Another reason for environment of organization and the importance of software
reconstructing the architecture is observed in the case of legacy architecture so that it is easy to out line the software
systems, when they need modification or an enhanced version of architecture efficiently [4].
the system is needed to be developed. This paper includes the
methods and tools involved in reconstructing the architecture The architecture of software is designed to validate and
and by comparing them the best method for reconstructing verify, which requirements can be implemented and which
architecture will be suggested. cannot. Architecture of a software system generally restrict the
developer within the scope, more the software is closest to the
Keywords—Software architecture; reverse engineering; architecture more it is easy to validate its conformance with
architecture reconstruction; architecture erosion; architecture the requirements.
mismatch; architecture chasm; architecture drift; forward
engineering; architectural aging Architecture reconstruction process is an iterative and
interactive approach. It consists of four steps. In the first step,
I. INTRODUCTION set of views are extracted from software implementation such
Many organizations use old softwares but as the new as source code and dynamic information. These views
advancements in technology occurring day by day there is represent the system‟s essential structural and behavioral
often a need to mold the softwares according to the current components. Second step consists of fusion of extracted
and latest technological aspects. But sometimes it is difficult views. It is used to create fused views that enhance and
to made changes to the code because as the time passes the improve the extracted views. In the Third step, the job of
documents which comprises the implementation of software analyst is to iteratively and interactively improve and applies
are outdated or missing. Mostly the idea of developing the design patterns to the fused views to reconstruct the
new software from scratch is not favored so software architectural-level views. Design pattern helps analyst to
architecture reconstruction is used to recover the architecture understand the architecture of the system as structural and
and then documenting and updating the architecture. behavioral relationships among different components. In the
last step, derived views are further investigated to evaluate
There are certain problems which arise while maintaining conformance of architecture, to identify goals for
and understanding the system. The first problem is that mostly reengineering or reuse, and to analyze the essential qualities of
architecture of a system does not explicitly shown in the architecture [5].
system unlike classes and packages; another problem is that
many large and important applications were developed over II. LITERATURE REVIEW
time so their architecture drifts [1]. These problems are solved This section presents an extended review of the research
by doing software architecture reconstruction [2]. work that has been done so far regarding the software
Software architecture represents the model of the software architecture reconstruction. It also includes the detail
system which expresses the high level of abstraction. The discussion on tools and techniques used for reconstruction of
architecture.

144 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 5, No. 12, 2014

Software architecture reconstruction terminology is reconstruction the architecture is started from the lower level
incomplete without including some terms like forward of gathering facts and aggregate the knowledge to higher
engineering, reverse engineering, architectural aging, levels. Source code analysis is populated in a repository which
architectural drift, architectural erosion and architectural is inquired to get abstract representations of the system [9] [3].
mismatch. Forward engineering includes the normal set of The process of bottom up techniques is shown in figure 2.
steps required for developing a system in which the system
has started from requirement gathering to implementation
phase. Where as in reverse engineering the inverse process is
carried out in which the programming details are used to get
the hidden details about the architecture of the system [7]. For
reverse engineering the important data of the system is
extracted, than the extracted information leads to the high
level design of the system and than the high level information
helps the developer to get into the architecture of the particular
system [6]. Factors due to which software loose its
architecture that leads to an architecture reconstruction are
discussed in architecture aging it occurs due to architecture
erosion, drift in the architecture or any mismatch occurs in
architecture. Mostly the violations of the architecture cause
the architecture to erode this scenario is also observed in
architecture drift because of the several ambiguities and not Fig. 2. view extraction from source code (1) and then refinement of
developing the system by following the architecture. extracted views (2)
Sometimes a gap is created between the architecture and code
of the system due to maintenance, testing etc. this is known as There are many bottom up techniques but only ARMIN,
architecture mismatch [7]. Figure 1 demonstrates the concept Dali and Rigi are discussed in this paper due to their good
of forward and reverse engineering. results.
a) Architecture Reconstruction and Mining: ARMIN
(Architecture Reconstruction and Mining) is an architecture
reconstruction tool developed by the Software Engineering
Institute and Robert Bosch Corporation. Once data is
gathered, further relationships are then manipulated. This
includes collecting, organizing and collapsing. In the end the
results can be viewed in an aggregator [10]. The architecture
reconstruction method using ARMIN consists of two steps.
 The first step is the source information: The elements
and their relations are extracted from the system are
inserted into ARMIN.
Fig. 1. Forward & Reverse Engineering  The second step is architectural view composition:
Views of the system‟s architecture are produced by
A. Tools and methodologies for software architecture extracting the source information via aggregation and
reconstruction: manipulation. The views are offered to the
Reconstruction of software architecture highlights the reconstructor which is present in the ARMIN tool; user
significant ways to provide the reconstruction of the can traverse and manipulate them.
architecture of the system and to evaluate the best likely
method to reconstruct the system architecture. The techniques The source code and other information are used as input to
explained in this paper are bottom up techniques, top down the tool. The reconstruction process results into the
approaches and hybrid techniques. architectural views presented to the user in the view generator
component of the tool. The user can manipulate the views
B. Bottom up Techniques: according to his requirements and can generate more views.
In the bottom up technique information gathering for Figure 3 shows the working of ARMIN.

145 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 5, No. 12, 2014

 Repository data model: The significant component is


the repository. It stores all the facts extracted from the
target system. Information stored in the repository is
presented to the user with visualizers.
 Graph-based editor: The essential part of Rigi is a
graph editor, rigiedit. Rigi's functionality is similar to
the functionality presented by basic graph editors.
Graphs can be loaded, saved, and laid out; the windows
depicting a graph can be scrolled and zoomed; the
nodes and arcs can be selected, cut, copied, and pasted
in a graph; Examples include computation of
cyclomatic complexity. Rigi joins graphical
visualization with textual reports to offer information
about the graphs at different degrees of detail. [23]
c) Dali: The Dali architecture is a structure aimed at to
provide combination of an extensive variety of extraction,
Fig. 3. Architecture Reconstruction Method using ARMIN
analysis, manipulation, and presentation tools. In Dali’s
The big advantage of ARMIN over the other bottom up structural design, rectangles represent different tools and
software reconstruction techniques is that if more than one lines depict the data flow among them. The structural design
views are generated then it will store the previous view also of Dali is shown in figure 5.
which other tools can‟t do [11][12][14]. The more detail and
usage of ARMIN can be viewed in [13] [14].
b) Rigi: Rigi is a research tool used to understand huge
knowledge spaces for example software programs,
architecture documentation, and the World Wide Web. This
can be achieved by reverse engineering method that models
the system by obtaining objects from the knowledge space,
managing them into high level abstractions, and representing
the graphical model of the given system. [22]. the exact
working process of Rigi is shown in figure 4.

Fig. 5. Dali Structural Design

To extract the source model there are variety of tools like


lexical, parser and profiling based tools that generate static
and dynamic views of the system under analysis. Static view
consists of static source artifacts which are extracted from
source code of the system. Dynamic view consists of dynamic
elements. These extracted views are then stored in repository
Fig. 4. Rigi Architcture which can be relational database. These extracted views are
then fused together into fused views. In the end, visualization
The main activities of Rigi‟s architecture include, tools are deployed in Dali to present the source model and the
extraction of facts forms existing systems, a repository which result of system architecture analysis. An example of this is
represents and store facts and analyzing and visualizing facts. Rigi, which can be used to present systems as a graph having
nodes which denotes the artifacts and arcs represents the
 Fact extraction: The process of Reverse engineering relations between them.[6]
starts with extracting facts from software‟s sources.
Sources can be inherent artifacts that are essential to C. Top down approaches:
compile and build up the system or supporting In these approaches reconstruction is started by previous
artifacts. Fact extractor can be constructed for a high level knowledge such as requirements and architectural
particular language. This approach can be further styles about the application domain and then formulates the
divided into two approaches; parser based and lexical hypothesis which is verified against the source code. Figure 6
extractors. Parsers produce a parse tree without shows the top down approach of software architecture
uncertainties. Whereas lexical extractors are reconstruction.
constructed on pattern matching of regular expressions
[23].

146 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 5, No. 12, 2014

Fig. 6. Top down approach for SAR, hypothesized architecture (1),


architecture conformance against source code (2), architecture refinement (3)

The term architecture discovery also defines this process


[9] [3]. Following are the top down approaches [9]. Fig. 8. Models and Metamodels in Cacophony
 RM Tool
There are several steps mainly involved in cacophony [17].
 Pulse In the first step the application domain of the system is
analyzed but according to the architectural point of view. In
 W4 the second step an inventory is maintained in which the
D. Hybrid approaches: information gathered through interviews, slides, various
documents etc. is kept a raw mapping between concepts and
In this approach, top down and bottom up approaches are information is given. In this step the interested information is
taken together for reconstructing the architecture; the low taken out of the inventory and the conceptual model is
level information is taken as an abstract to refine high level developed. The various conceptual Meta models developed
information. They stop architectural erosions [9] [3]. Figure 7 from the gathered information stored in the inventory all the
shows the hybrid approach. metamodels are combined together to make one Meta model.
In the next step the metamodel is again analyzed in which it
should be kept in mind that the combined metamodels should
be clustered in the cohesive way so that when they are needed
to be analyzed separately no dependency exists between them.
In the next step three things are developed actors
identification, use case identification and use case description.
Actors are usually the stake holders of the system. Now the
stake holders and the use cases are combined so as to view and
analyzed where actually the gap is occurs in the system, for
this purpose several meetings and interviews are held which
specify the problem. Now from the requirements
specifications are highlighted and the use case is passed from
Meta model. The software is now visualized and in the end its
implementation, evaluation and evolution is made.
b) Symphony/Nimeta: In symphony view points and
Fig. 7. Hybrid Approach
views are used in which are used in constructing the
There are many hybrid approaches hybrid approaches but architecture reconstruction models [8].
only cacophony, symphony and Nimeta are discussed in this  Viewpoints: These are mostly discussed at abstract
paper due to their better results [9]. level by selecting a set of architectural concepts and
rules. It is to be done for focusing on the specific
a) Cacophony: It is a Meta model driven architecture aspect of a system [8].
reconstruction [8]. The model of a system gives a simplified
view of a system. The model should able to answer the queries  Views: A view on the basis of given view point gives a
like the original system. A metamodel is also a model that representation if a system [8].
describes a way of representing the model. Representation of E. Views in Symphony:
models and Meta models in cacophony is shown in figure 8.
 Source View The view of the system can be getting
from source code.

147 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 5, No. 12, 2014

 Target View This is the final view that contains the is driven by business goals, expressed in quality attributes that
implementation information that is needed to solve the should be evaluated on existing systems. Evaluations involve a
problem. systematic way to reason about the achievement of quality
goals. We indicate the systematic way as a framework which
 Hypothetical View It shows the present understanding
of the architecture but mostly it is not correct. helps the software architect to assess or design architectures.
The analysis of quality attribute framework is shown in figure
There are two stages needed to be fulfilled while 9.
reconstruction the architecture [19].
1) In this phase problem elicitation is done by
communicating with stake holders and then problem is
identified. Then the architectural concepts are revealed related
to solve the problem and then a proper recovery strategy for
that problem is developed.
2) The specification needs of an architecture
reconstruction are viewed in which the source view creates
mapping with the target views to solve the identified problem.
III. SOFTWARE QUALITY ATTRIBUTES
Quality attribute requirements specify the nonfunctional Fig. 9. Analysis framework of Quality Attributes Driven
requirements of software application, which captures many
aspects of how the functional requirements of an application Quality attributes are improved into quality attribute
are achieved. Designers need to determine the following “scenarios”. It is a requirement which is related to quality
points when architecture of the software is specified: attribute of the system. It comprises mainly of 1) stimulus and
2) response. The stimulus acts like a signal to the system when
 The amount to which software architecture features the signal reaches the system and then a respective response
can influence the quality attributes. regarding the stimulus is generated by the system. The quality
 The amount to which techniques can support or attribute facts also tell about the generating point of stimulus,
conflict the attributes. which procedure or component generates the stimulus how the
response is taken into consideration [15]. A tactic in
 The amount to which various qualities attribute architectural reconstruction represents the association between
requirements can be fulfilled at the same time. design decisions and the response from quality attributes [3].
The Quality Attributes Driven Analysis Framework handles
IV. QUALITY ATTRIBUTES DRIVEN SOFTWARE the information extracted from the existing system to be used
ARCHITECTURE RECONSTRUCTION in Quality Attribute Model with the required architecture
In [4], Quality attribute driven evaluation to reconstruct elements. Architecture elements, properties, relations, and
the software architecture is introduced. This technology is tactics are integrated under the model of architecture views.
used to presents an analysis framework and illustrates the An architecture view represents the set of elements of the
information about the system software. This information is system and relationships between them [16]. The steps of
used for the method of reconstruction to relate the knowledge quality attribute driven framework are shown in figure 10.
obtained back from organization‟s business goals. The goal of
this approach is to offer extensive information that will
contribute to analyze the software quality attributes.
a) Application contexts: Few application contexts in
which Software architecture reconstruction can be applied for
the analysis of architecture are: [4]
 To streamline current products into product lines.
 To assess the existing systems.
 Decision making between rival existing systems.
 System Reconstruction
Fig. 10. the QADSAR Steps
b) Quality attributes driven analysis framework: The
analysis framework serves as a way to assess systems in the There are many phases include in quality attribute software
attainment of specific goals of quality attributes, for example architecture reconstruction. To activate the method, Quality
scalability and performance goals. The analysis framework Attributes Driven Analysis Framework needs information
serves the architecture reconstruction to make specific about the architecture to perform the quality attribute analysis.
characteristics of existing software recognizable. The analysis Phase 1 defines the scope for Software architecture

148 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 5, No. 12, 2014

reconstruction. The scope recognizes the architecture view provide grouping of related objects of a software system. The
types [8] and the system parts that need to be constructed. The basic purpose of clustering exploration is to assist in
identification depends on the quality attribute scenarios, the understanding the observations in a better way and also the
related quality attribute models, and the type of system. construction of complex knowledge structure from features
and object clusters. Similar things are grouped into clusters so
The phase 2 in the approach involves extracting the that similarity between clusters or independency is high, and
elements of source from the resources available. Source similarity between different clusters or dependency is low [2].
elements are the constructs of the implementation language Clustering algorithms can be divided into two types, namely,
like functions, classes, files, and directories. Relations define partition based and hierarchical.
how the source elements are related to each other, such as call
relations between functions or read accesses by methods on a) Partition clustering algorithms: Partition algorithms
attributes. Besides static characteristics there are also dynamic starts with a primary partition consisting of a certain amount
characteristics like function execution time, or process of clusters. The partition is then amended at every step and
relations. The static relations are typically generated by some condition is optimized while keeping the number of
existing tools like source code parsers or lexical analyzers. clusters constant. Subdivisions of partition algorithms contain
Dynamic information is produced by profiling or code graph-theoretic, mode-seeking and mixture resolving
instrumentation techniques. The extracted elements and algorithms. In Partition algorithms, it is necessary to identify
relations comprise the Source Model. number of clusters in advance, which can create difficulty if
Phase 3 involves identifying and applying aggregation we do not have previous information about the data set.
strategies to abstract the detailed views of the sources. There Additionally, the partition clustering algorithms are not cost
are a lot of strategies for aggregation, which greatly rely on effective because the items are partitioned into clusters and
the existing system and the architecture views that need to be this partition leads to the creation of many clusters which
extracted. Various techniques exist such as Relation Partition make the algorithm expensive. To overcome the computational
Algebra and Tarski Algebra for manipulation of relational complexity of partition algorithms, researchers have proposed
information [13, 14, 19]. The aggregated elements constitute heuristic-based approaches to assist software architecture
the Aggregation Model. The aggregation model consists of reconstruction.
entities and relations that are collapsed. They might be
b) Square error clustering algorithms: Square error
associated with architecture elements but they are not
explicitly denoted as architecture elements with particular algorithm starts with a primary division of the entities in a
properties. fixed number of clusters and iteratively shuffles entities
between clusters to optimize some clustering measure. This
To acquire the necessary views of the architecture which measure denotes the quality of the clustering [21].
we assign in phase 4 the types of the elements which are c) Graph-theoretic clustering algorithms: Graph-
specified by the view-type of the analysis framework.
theoretic algorithms are partition algorithms that operate on
Elements are presented as layers, tasks, „consist of‟ relations,
graphs. The nodes of these graphs correspond to entities and
etc. We next assign required properties, such as throughput,
deadlines for tasks, etc. Further associate tactics are associate the edges relations between these entities. In general graph
that are achieved with a particular set of architecture elements. algorithms try to split this graph into sub graphs that will
form the clusters, instead of focusing on the entities
The outcomes of step 4 support the QAD Analysis themselves. [20]
Framework for step 5 Evaluation Of Quality Attributes which d) Hierarchical clustering algorithms: Hierarchical
is performed with the particular quality attribute scenarios,
clustering is one of the clustering techniques that are based on
quality attribute models, and the corresponding architecture
a hierarchical breakdown of nodes. Hierarchical algorithms
tactics. The tactics are used to reconstruct the architectural
views that helps the quality attribute scenario. can be further divided into agglomerative and divisive
algorithms. In divisive algorithms whole graph is taken as one
V. INTERFACE IDENTIFICATION: single cluster initially.in further steps of the algorithms this
Interface identification is the reverse engineering cluster is divided into smaller clusters in hierarchy until each
technique in which the interfaces involved in the software vertex is denoted by one cluster. Whereas in agglomerative
systems are identified by performing the analysis of source algorithms definition starts with the representation of one
code. It is a bottom up technique in which the components of a cluster for each vertex in the graph. Moving towards next
system that are externally visible can be identified, some steps in the algorithm, the two clusters having the highest
externally visible data elements are also observed. This similarity are combined to develop a new cluster [20] [2]
technique actually shows the interactions of various when there is only one cluster left the process of
components. The source code of a system is broken down into agglomerative algorithm stops.
small pieces of code and then it is gathered in a way that it e) Vertex similarity: This function defines the
should act like a single entity [19]. similarities of vertices. There are vertices and edges if two
vertices have similar property so they have a strong bonding
VI. CLUSTER BASED ARCHITECTURE RECONSTRUCTION:
between them and they will be assigned a higher priority value
Clustering approaches are used in many disciplines to [21].

149 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 5, No. 12, 2014

VII. COMPARATIVE ANALYSIS Proceeding WCRE '03 Proceedings of the 10th Working Conference on
Reverse Engineering
In architecture reconstruction process the top down and [5] http://www.sei.cmu.edu/architecture/research/previousresearch/reconstruc
bottom up approaches are used these approaches are not only tion.cfm. Liam O‟Brien Software Engineering Institute Carnegie Mellon
advantageous but also they have many draw backs. The University 4500 Fifth Avenue Pittsburgh, Chris Verhoef Free University
drawback of bottom up approaches is that they are mostly of Amsterdam De Boelelaan 1081a 1081 HV Amsterdam The
Netherlands
manual, consume much time and they can only work in the
particular domain the knowledge used in the specified [6] Software Architecture Reconstruction René L. Krikhaar.
domains are mostly used as an input knowledge. The [7] Claudio Riva, View-based Software Architecture Reconstruction.
Institute of Information Systems Distributed Systems Group Vienna,
drawback of clustering is that these algorithms are automated Austria
and their verification is manual. The drawback of top down [8] Vijaya Datta Mayyuri Software Architecture Reconstruction, Symphony,
approach is that they generate many views at a time during Cacophony. Reverse Engineering, 2004. Proceedings. 11th Working
exploration and it creates ambiguities in finding the interested Conference, 8-12 Nov, 2004.
views so an interested view can only be found by analyzing [9] Mircea Lungu ,The 5 questions you always asked yourself about
each view separately [24]. Software Architecture Recovery, Faculty of Informatics, University of
Lugano October 2008.
ARMIN extracted information from the code in rigi [10] Ian Gorton, Liming Zhu, Tool Support for Just-in-Time Architecture
standard format [14]. The big advantage of ARMIN over the Reconstruction and Evaluation: An Experience Report. Proceedings of
other bottom up software reconstruction techniques is that if the 27th international conference on Software engineering, 2005
more than one views are generated then it will store the [11] Study Liam O‟Brien Christoph Stoermer ,Architecture Reconstruction
previous view also which other tools can‟t do [11][12][14]. Case, April 2003
[12] Liam O' BrienVorachat Tamarree Architecture Reconstruction of J2EE
The information gathered from software is manipulated Applications: Generating Views from the IVIodule View type, November
and visualized using rigi. It contains an interpreter that applies 2003
operations on the visuals extracted from the software. The [13] Rick Kazman Liam O'Brien Chris Verhoef , Architecture Reconstruction
nodes can be selected or removed manually. Parsers are Guidelines, Third Edition Publisher, Software Engineering Institute,
present in rigi that give the extracted information in rigi November 2003.
standard format. Dali is the collection of many tools. Dali is [14] Liam O‟Brien1 Dennis Smith, Grace Lewis Supporting Migration to
Services using Software Architecture Reconstruction, Software
the extension of rigi because in rigi only the visual effect of Technology and Engineering Practice, 2005. 13th IEEE International
the extracted information is shown but in Dali queries can be Workshop
applied on data generated by view of the system. In Dali more [15] Bass, L.; Clements, P. and Kazman, R. Software Architecture in Practice,
than one view are generated at a time. ARMIN is the further third Edition. Publisher, Addison Wesley, October 5, 2012
extension of Dali. It has the effect of both rigi and Dali but it [16] Clements, P.; Bachmann, F.; Bass, L.; Garlan, D.; Ivers, J.; Little, R.;
is advantageous over both the techniques that it not only Nord, R. and Stafford, J., Documenting Software Architectures: Views
generates many views bit it can also store the previous views and Beyond, publisher, Addison Wesley, October 6, 2002.
[25]. [17] Jean-Marie Favre, Cacophony Metamodel-Driven Software Architecture
Reconstruction, University of Grenoble, France
VIII. CONCLUSION [18] van Deursen, A. CWI & Delft Univ. of Technol., Netherlands
Hofmeister, Christine ; Koschke, R. ; Moonen, L. ; Symphony: View-
In this paper we briefly analyze different approaches for Driven Software Architecture Reconstruction. Software Architecture,
software architecture reconstruction process. It is seen that 2004. WICSA 2004. Proceedings. Fourth Working IEEE/IFIP
among all approaches bottom up approach is the appropriate Conference on12-15 June 2004
approach for reconstruction the architecture because top down [19] Michael W. Barton Richard C. Chapman. A Technique to Identify
and hybrid approaches at certain points leads to the bottom Component Interfaces.
approach. ARMIN is the most appropriate tool for performing [20] Chung-Horng Lung , Software Architecture Recovery and Restructuring
through Clustering Techniques, ,Software Engineering Analysis Lab,
architecture reconstruction because it sum up the aspects of all Nortel, Proc. of the 3rd International Software Architecture Workshop
other tools in it and provides an ease of use to the users. After (ISAW), 1998
this survey in the future the architecture reconstruction is [21] Niels Streekmann, Clustering-Based Support for Software Architecture
performed practically using ARMIN. Restructuring, Publisher, Vieweg+Teubner Verlag; 2011 edition
(December 14, 2011)
REFERENCES
[22] Kenny Wong , Rigi User‟s Manual Version 5.4.4 June 30, 1998
[1] Damien Pollet Stéphane Ducasse Loïc Poyet Ilham Alloui Sorana
Cîmpan Hervé Verjus Towards A Process-Oriented Software [23] Holger M. Kienle Hausi A. Muller, the Rigi Reverse Engineering
Architecture Reconstruction Taxonomy, LISTIC, University de Savoie, Environment, University of Victoria, Canada
France, july-aug-2009. [24] Mrs. S. Rajeshwari Mr. Telugu Manohar Software Reconstruction
[2] Ioana Sora, Gabriel Glodean, Mihai Gligor, Software Architecture Techniques For Software Architecture published in International Journal
Reconstruction: An Approach Based on Combining Graph Clustering and of Advanced Trends in Computer Science and Engineering, , 2013.
Partitioning. Proceedings of the (ICCC-CONTI), 2010 International Joint [25] Carnegie Mellon Liam O'Brien Christoph Stoermer Chris Verhoef
Conference on 27-29 May 2010 Department of Computers Politehnica Software Architecture Reconstruction: Practice Needs and Current
University of Timisoara Approaches Publisher, Software Engineering Institute August 2002
[3] Stéphane Ducasse Damien Pollet. Software Architecture Reconstruction:
a Process-Oriented Taxonomy July/August 2009 (vol. 35 no. 4)
[4] Christophe Stoermer, liam O‟brien , Chris Verhoe, Moving Towards
Quality Attribute Driven Software Architecture Reconstruction.

150 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 5, No. 12, 2014

Zigbee Routing Opnet Simulation for a Wireless


Sensors Network
ELKISSANI Kaoutar Pr.Moughit Mohammed Pr.Nasserdine Bouchaib
Labo IR2M FST Settat Labo IR2M FST Settat Labo IR2M FST Settat
Université Hassan 1er Université Hassan 1er Université Hassan 1er
Settat,Morroco Settat,Morroco Settat,Morroco

Abstract—Wireless sensor network are nowadays considered for a weak use of the medium shared by all, for example 0,1%
as a viable solution for medical application . A zigbee network of time.
model is more suitable for battery capacity, bandwidth, and
computing limitation for WSN. This paper will present an Opnet Typically, a transmitting receiving ZigBee module will
simulation of a zigbee network performance in order to compare occupy the medium during a few milliseconds in emission, will
routing results in 3 different topologies ( Star , Mesh and Tree ). await possibly an answer or an aquitiment , then will be be in
stand by for a long period before the next emission, which will
Keywords—WSN ; Zigbee; rooting; Opnet take place at one predetermined moment.
I. INTRODUCTION This need introduces interesting problems of research, in
particular on the level of the data link layer (Delay , storage
The miniaturization of the sensors, the increasingly low and access to the medium) and network (routing respecting
cost, the broad range of the types of sensors available as well energy constraints). ZigBee envisages two types of entities
as wireless support of communication, allow the networks network: the FFD (Full Function Device) implement the
sensors to develop in several applications . They also make it totality of the specification and the RFD (Reduced Function
possible to extend the existing applications. The sensors Device) are entities reduced in an objective of less power
network can appear very useful in many applications when it is consumption and less memory use for the microcontrolor.
a question of collecting and processing data coming from the
environment. Among the fields where these networks can offer RFD are necessarily final nodes of the network because
the best contributions, we quote : military , monitoring, they does not implement a routing mechanism. Typically, an
environmental, medical, domestic, commercial, etc. embarked sensor will be RFD and supplied with batteries,
whereas a central processing unit of treatment, supplied with a
We could imagine that in the future, the monitoring of the source not forced by an energy containte (hand powered), is
human being vital functions would be possible thanks to FFD with the function of routing.
microsensors which could be swallowed or installed under the
skin [1]. Currently, of the micro-cameras which can be IEEE 802.15.4, ZigBee can work on three frequency bands:
swallowed exist. They are able, without having recourse to the 868MHz (Europe), 915MHz (North America) and 2,4GHz
surgery[4], to transmit images of the interior of a human body (World). The standard envisages two different physical layers
with a 24 hour endurance[5]. Other ambitious biomedical (PHY), for the 868/915MHz (PHY868/915) and a second for
applications are also presented, such as: monitoring of the level 2,4GHz (PHY2450) implementing a spread spectrum
of glucose, the monitoring of the vital bodies or the detection modulation.
of cancers. The use of the networks of sensors in the field of
A. Zigbee protocole:
medicine could bring a permanent monitoring of the patients
and a possibility of collecting physiological information of The ZigBee pile is composed of several layers of which the
better quality, thus facilitating the diagnosis of some physical layer (PHY), MAC layer, layer network (NWK),
diseases[6]. underlayer support application (APS) and ZigBee Device
Object (ZDO). In the following figure is the ZigBee pile with
II. ZIGBEE: its layers.
ZigBee is a LP-WPAN (Low Power-Wireless Personal
Area Network): it is a wireless network with short range and
low power consumption. It is characterized by a range of a few
hundred meters and a low flow (250kbit/s max)[2]. The
standard was conceived to inter-connect embarked units like
sensors.
It is based on the standard IEEE 802.15.4 for the physical
and data link layers proposes its own other layers (network,
etc)[2]. The difference between ZigBee and the majority of the
other WPAN are the use of the medium; ZigBee is optimized

151 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 5, No. 12, 2014

 Mesh topology

Star topology ( figure 2) is simplest and the most limited


among all Zigbee topologies. It’s made up of central
equipment (coordinator) and the other equipment of the
network (router, end device). Each equipment of the network
can only communicate with the coordinator. Consequently, to
send a packet from one equipment to the other, this one must
pass through the coordinator who will send the packet
towards the destination[2].

Fig. 1. Zigbee Protocol

The physical layer (PHY), defines the physical operations


of the ZigBee equipment by including the sensitivity of the
reception, number of the channels, the power transmission , the
modulation and the specifications of the transmission rate. The
MAC layer manages the transactions of data RF between the
neighbours nodes (point-to-point). This layer includes the
services such as the management of retransmissions and Fig. 2. Star topology
payments without forgetting the techniques to avoid collisions
CSMA-CA. The disadvantage of star topology is that there are no
alternate routes if the link between the coordinator and the end
The network layer (NWK) adds the capacities of routing device fails.
which allow the RF data to cross several equipment (multiple
hops) for router the data since the source towards the The other disadvantage of this topology is that all the
destination (peer to peer). packets must pass through the coordinator, this last can be
saturated with a great number of packets and like result, we
This layer manages also the mechanisms of neighbors have a congested network.
discovering , routes discovering and maintaining , mechanism
to join or leave the network etc. The Tree topology ( figure 3 ) is made up of a coordinator
to which other equipment are connected. The coordinator is
Application support (APS) is an application layer which related to the several routers and end devices (his/her children).
defines various objects of addressings including the profiles, A router can be also connected to several routers and end
the clusters and the end devices. devices and that can continue until a certain number of levels.
ZigBee Device Object (ZDO) is the applicative layer which This hierarchy can be visualized like a structure of tree with the
provides the functionalities of discovering equipment and coordinator at the top.
services, it includes also the advanced capacity for the
management of the network. It defines also the role of the
nodes in network for example coordinator or end device.
Security Services Provider (SSP) manages MAC security
only for the MAC frame , the security of the network for the
NWK frames of order and safety for APS frame . The
characteristics of this layer are the authentification, the
encryption, the integrity of the message etc.
B. Topology: Fig. 3. Tree topology
The standard IEEE 802.15.4 envisages two topologies: star
(star - all the nodes communicate with a central node called The router can be used as an end device in the tree of the
coordinator) or point-to-point (peer to peer - all the nodes with network, but in this case the functionality of diffusion of
radio range can communicate together without hierarchy). The message is not used. In tree topology, the coordinator and the
formed network is called PAN[2]. The network layer of routers can have children, therefore they can be parents. On
ZigBee allows the creation of mesh topology thanks to an the other hand, the end devices cannot be parents and cannot
automatic routing: it is topology with a grid, or mesh topology. have children either.
Three topologies can be considered in the installation of a The children can communicate only with their parents,
ZigBee network: while the parents can communicate with their children and
their own parent. The disadvantage of this topology is that
 Star topology there is no alternate road if the bond necessary to reach the
 Tree topology destination fails.

152 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 5, No. 12, 2014

The Mesh topology ( figure 4 ) has a structure similar to B. Results:


that out of tree with a coordinator at the top of the tree. In a Number of hops ( figure 6 ) is the average number of
mesh topology, the coordinator is related to his children hops traveled by application traffic in the PAN. It’s is the
(routers, end devices), it can also be related to several routers number of times a packet travels from the source throught the
and end devices (his/her children). intermediate nodes to reach the destination.
However, rules of communication are more flexible The number of hops for the star topology is equal 2 wish
because the routers can communicate directly between them. A mean that the source and the random destination have another
mesh topology is characterized with a more effective intermediate node wish relays the data ( the coordinater ).
propagation of the packets[2] , that means that alternate roads
can be found if a bond breaks down or if there are congestions. The number of hops for the tree topology is equal 5 as we
A discovery of road is planned for makes it possible the set the network depth to 4 , the mesh topology uses a routing
network to find the best way available to convey the packet. table.

Fig. 4. Mesh topology

III. OPNET SIMULATION:


A. Simulation
To simulate a ZigBee network, OPNET proposes models of
peripherals for the ZigBee coordinators, routers and end
devices . Main goal of the simulation of network is to analyze
the performances of a ZigBee network in a context WSN. The Fig. 6. Number of hops simulation
WSN can vary from few meters to several thousands of meters,
for example, agricultural applications and environmental often End to End delay ( figure 7 ) is a measurement of the
extend at long distances while the residential construction network delay on packet and is measured by the time interval
applications can be much smaller[6]. In addition, certain WSN between when a message is queued for transmission at the
use only few of sensors as end devices others employ physical layer until the last bit is received at the receiving node.
hundreds, and sometimes even thousands of devices . ZigBee Our end to end delay results of the 3 topologies star and
operate numerous protocols in order to determine the optimal Mesh have close end to end delay in this simulation . The end
way to take for the routing the packets. This section will to end delay of the tree is higher for more than 50% .
discuss the results of some OPNET simulation of 3 differents
topologies tree , mesh and star in order to discuss a
comparasion and see wish is more suitable for a WSN in the
medical field depending on the network requirement .

Fig. 5. Simulation scenario


Fig. 7. End to End delay simulation

153 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 5, No. 12, 2014

IV. CONCLUSION
From all the results , it can be conclude that the tree routing
even if it present the lower and to end delay , it’s less suitable
for WSN due to number of hops results wish mean more
energy consumption . Our future work will be more detailed
study of energy efficiency and reliability. The major goal is
developing a protocol that would be energy aware considering
a medical application for WSN.
REFERENCES
[1] H. Alemdar and C. Ersoy, “Wireless Sensor Networks for Healthcare: A
Survey,” ComputerNetworks, Volume 54, Issue 15, pp. 2688-2710,
October 2010
[2] P. Baronti, P. Pillai, V. Chook, S. Chessa, A. Gotta, Y.F. Hu, “Wireless
Sensor Networks: a Survey on the State of the Art and the 802.15.4 and
ZigBee Standards,” Computer Communication, Vol. 30, N°. 7, pp. 1655-
1695, 2007.
[3] M. Chen, S. Gonzalez, A. Vasilakos, H. Cao andV-C-M. Leung, “Body
Area Networks: A Survey,” Mobile Networks and Applications,
Springer, 2010.
[4] Emil Jovanov, Aleksandar Milenkovic, Chris Otto and Piet C de Groen,
“A wireless body area network of intelligent motion sensors for
computer assisted physical rehabilitation,”
[5] A. Milenkovic, C. Otto, E. Jovanov, “Wireless sensor networks for
personal health monitoring: Issues and an implementation”
[6] “A Study of ZigBee Network Topologies for Wireless Sensor Network
with One Coordinator and Multiple Coordinators” :Tikrit Journal of
Engineering Sciences/Vol.19/No.4/December 2012,

154 | P a g e
www.ijacsa.thesai.org

You might also like