Main Report

Acknowledgement
I wish to thank Dr. S. C. Pilli, Principal, KLEs College and Technology,

Belgaum, for his inspiration and guidance.
I thank my guide Rashmi G, for giving me the encouragement and moral
support. Her highly encouraging nature has always inspired me to work
more. Her advice, technical and personal have been truly indispensable.
I would like to thank our beloved HOD Dr. Nandini Sidnal, Depart-
ment of Computer science and engineering, Belgaum for her guidance and
support during my dissertation work. I am highly indebted to her for her
constant encouragement, advice and ideas during the different phases of my
project work.
Finally, I would like to acknowledge the support of my friends and other
teaching and non teaching staff members of our department for having pro-
vided the help in every form. And last but not the least I would like to thank
my family members for without their love and support this would never have
been possible.
Abstract
Cloud computing as an emerging technology that is expected to reshape

information technology. With the increasing popularity of cloud computing
there is increased motivation to outsource data services to the cloud to save
money. An important problem in cloud environment is to protect users
privacy while querying data from the cloud. The researchers have proposed
several techniques to address this problem. However, existing technologies
incur heavy computational cost and bandwidth related cost. In this paper a
scheme named aggregation and distribution layer (ADL) is being proposed
along with efficient information retrieval for ranked query (EIRQ), to further
shrink communication costs experienced in the cloud environment. All the
queries will be grouped into multiple ranks, so that a query which is ranked
higher can retrieve a greater percentage of corresponding files. The users
are allowed to enter the file name also in case if he knows the exact file to be
retrieved. This will help to stop the retrieval of unnecessary files that would
have been fetched if only keywords were used.
Contents
1 Introduction 1
1.1 Cloud computing . . . . . . . . . . . . . . . . . . . . . . . 1
1.1.1 What is cloud? . . . . . . . . . . . . . . . . . . . . 2
1.1.2 What is cloud computing? . . . . . . . . . . . . . . 2
1.1.3 Classification of services provided by cloud . . . . . 2
1.1.4 Types of clouds . . . . . . . . . . . . . . . . . . . 4
1.2 Problem Statement . . . . . . . . . . . . . . . . . . . . . . 5
1.3 Aim and objective . . . . . . . . . . . . . . . . . . . . . . . 5
1.4 Overview of project . . . . . . . . . . . . . . . . . . . . . . 6
1.5 Issues and Challenges . . . . . . . . . . . . . . . . . . . . 7
2 Literature Survey 8
2.1 Previous Research Work . . . . . . . . . . . . . . . . . . . 8
2.2 Existing System . . . . . . . . . . . . . . . . . . . . . . . . 9
2.3 Proposed System . . . . . . . . . . . . . . . . . . . . . . . 10
3 Software Requrirement Specification 12

3.1 Overall Description . . . . . . . . . . . . . . . . . . . . . . 13
3.1.1 Product Perspective . . . . . . . . . . . . . . . . . . 13
3.1.2 Product Functions . . . . . . . . . . . . . . . . . . 14
3.1.3 User Characteristics Administrator . . . . . . . . . 14
3.1.4 Pre-requisites . . . . . . . . . . . . . . . . . . . . . 15
i
3.1.5 Assumptions and Dependencies . . . . . . . . . . . 15
3.2 Specific Requirements . . . . . . . . . . . . . . . . . . . . 16
3.2.1 Functional Requirement . . . . . . . . . . . . . . . 16
3.2.1.1 Purpose . . . . . . . . . . . . . . . . . . 16
3.2.1.2 Process . . . . . . . . . . . . . . . . . . . 17
3.2.2 Non Functional Requirement . . . . . . . . . . . . . 17
3.2.3 Software Requirements . . . . . . . . . . . . . . . 18
3.2.4 Hardware Requirements . . . . . . . . . . . . . . . 18
4 System Design 19
4.1 Design Considerations . . . . . . . . . . . . . . . . . . . . 20
4.2 Development Method . . . . . . . . . . . . . . . . . . . . 20
4.3 System Architecture . . . . . . . . . . . . . . . . . . . . . . 22
4.4 Flow charts . . . . . . . . . . . . . . . . . . . . . . . . . . 22
4.4.1 Flowchart for User . . . . . . . . . . . . . . . . . . 23
4.4.2 Flowchart for ADL . . . . . . . . . . . . . . . . . 23
4.4.3 Flowchart for Admin . . . . . . . . . . . . . . . . . 23
4.4.4 Flowchart for Cloud . . . . . . . . . . . . . . . . . 24
4.5 Data Flow Diagram . . . . . . . . . . . . . . . . . . . . . . 24
4.6 Use case Diagram . . . . . . . . . . . . . . . . . . . . . . . 26
4.6.1 Use case Diagram for Users . . . . . . . . . . . . . 27
4.6.2 Use case Diagram for ADL . . . . . . . . . . . . . 27
4.6.3 Use case Diagram for Administrator . . . . . . . . . 28
4.6.4 Use case Diagram for Cloud . . . . . . . . . . . . . 28
4.7 Sequence Diagram . . . . . . . . . . . . . . . . . . . . . . 29
5 Implementation 31
5.1 Implementation Requirements . . . . . . . . . . . . . . . . 31
5.2 Selection of the Platform . . . . . . . . . . . . . . . . . . . 32
5.3 Selection of Language . . . . . . . . . . . . . . . . . . . . 32
ii
5.3.1 Java Technology . . . . . . . . . . . . . . . . . . . 32
5.3.1.1 How Java Technology Works . . . . . . . 32
5.3.1.2 How Java Technology Changes Our Life . 34
5.3.2 JavaScript . . . . . . . . . . . . . . . . . . . . . . . 35
5.3.2.1 Few things we can do with JavaScript . . 35
5.3.2.2 Difference between JavaScript and Java . . 36
5.3.3 JDK . . . . . . . . . . . . . . . . . . . . . . . . . 36
5.3.4 MySQL . . . . . . . . . . . . . . . . . . . . . . . 36
5.3.4.1 Distinct Features of MySQL . . . . . . . 37
5.3.4.2 FEATURES OF MYSQL . . . . . . . . . 38
5.3.5 HMTL5 . . . . . . . . . . . . . . . . . . . . . . . 38
5.3.5.1 Features . . . . . . . . . . . . . . . . . . 40
5.4 Overview of Module Description . . . . . . . . . . . . . . 42
5.4.1 Efficient Information Retrieval for Ranked Query: . 42
5.4.2 Aggregation and Distribution Layer: . . . . . . . . . 42
5.4.3 Ranked Queries: . . . . . . . . . . . . . . . . . . . 43
5.4.4 User Privacy . . . . . . . . . . . . . . . . . . . . . 44
5.5 Working of modules . . . . . . . . . . . . . . . . . . . . . 44
5.6 Snap Shots . . . . . . . . . . . . . . . . . . . . . . . . . . 46
5.6.1 Home page . . . . . . . . . . . . . . . . . . . . . . 46
5.6.2 Organisation Login . . . . . . . . . . . . . . . . . . 46
5.6.3 Organisation Home . . . . . . . . . . . . . . . . . . 46
5.6.4 User Details . . . . . . . . . . . . . . . . . . . . . . 46
5.6.5 File Upload . . . . . . . . . . . . . . . . . . . . . . 46
5.6.6 Cloud Login . . . . . . . . . . . . . . . . . . . . . 46
5.6.7 Cloud Home . . . . . . . . . . . . . . . . . . . . . 46
5.6.8 Cloud Queries . . . . . . . . . . . . . . . . . . . . 46
5.6.9 ADL Login . . . . . . . . . . . . . . . . . . . . . . 47
5.6.10 ADL Home . . . . . . . . . . . . . . . . . . . . . . 47
iii
5.6.11 User Sign Up . . . . . . . . . . . . . . . . . . . . . 47
5.6.12 User Home . . . . . . . . . . . . . . . . . . . . . . 47
5.6.13 User Login . . . . . . . . . . . . . . . . . . . . . . 47
5.6.14 Ostrovosky Query . . . . . . . . . . . . . . . . . . 47
5.6.15 Ostrovosky result . . . . . . . . . . . . . . . . . . . 47
5.6.16 EIRQ Query . . . . . . . . . . . . . . . . . . . . . 47
5.6.17 EIRQ Result . . . . . . . . . . . . . . . . . . . . . 48
6 Testing 49
6.1 Introduction to Testing . . . . . . . . . . . . . . . . . . . . 49
6.2 Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
6.3 Test Environment . . . . . . . . . . . . . . . . . . . . . . . 50
6.4 Unit testing . . . . . . . . . . . . . . . . . . . . . . . . . . 50
6.4.1 Unit test cases . . . . . . . . . . . . . . . . . . . . 51
6.5 Integration Testing . . . . . . . . . . . . . . . . . . . . . . 51
6.6 System Testing . . . . . . . . . . . . . . . . . . . . . . . . 52
iv
List of Figures
1.1 Cloud clomputing Architecture . . . . . . . . . . . . . . . . 2

1.2 Classification of services provided by cloud . . . . . . . . . 3
1.3 Type of cloud . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.4 Application scenario . . . . . . . . . . . . . . . . . . . . . 6
4.1 System architecture . . . . . . . . . . . . . . . . . . . . . . 22

4.2 Flowchart for user . . . . . . . . . . . . . . . . . . . . . . . 23
4.3 Flowchart for ADL . . . . . . . . . . . . . . . . . . . . . . 23
4.4 Flowchart for Admin . . . . . . . . . . . . . . . . . . . . . 23
4.5 Flowchart for Cloud . . . . . . . . . . . . . . . . . . . . . . 24
4.6 Use case for User . . . . . . . . . . . . . . . . . . . . . . . 27
4.7 Usecase for ADL . . . . . . . . . . . . . . . . . . . . . . . 27
4.8 Use case for Administrator . . . . . . . . . . . . . . . . . . 28
4.9 Use case for Cloud . . . . . . . . . . . . . . . . . . . . . . 28
4.10 Sequence diagram . . . . . . . . . . . . . . . . . . . . . . . 30
5.1 Working of java . . . . . . . . . . . . . . . . . . . . . . . . 33

5.2 Working process of EIRQ . . . . . . . . . . . . . . . . . . . 45
5.3 Home page . . . . . . . . . . . . . . . . . . . . . . . . . . 46
5.4 Organisation login . . . . . . . . . . . . . . . . . . . . . . . 46
5.5 Organisation home . . . . . . . . . . . . . . . . . . . . . . 46
5.6 User details . . . . . . . . . . . . . . . . . . . . . . . . . . 46
5.7 File upload . . . . . . . . . . . . . . . . . . . . . . . . . . 46
v
5.8 Cloud login . . . . . . . . . . . . . . . . . . . . . . . . . . 46
5.9 Cloud Home . . . . . . . . . . . . . . . . . . . . . . . . . . 46
5.10 Cloud queries . . . . . . . . . . . . . . . . . . . . . . . . . 46
5.11 ADL login . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
5.12 ADL home . . . . . . . . . . . . . . . . . . . . . . . . . . 47
5.13 User sign up . . . . . . . . . . . . . . . . . . . . . . . . . . 47
5.14 User home . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
5.15 User login . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
5.16 Ostrovosky query . . . . . . . . . . . . . . . . . . . . . . . 47
5.17 Ostrovosky result . . . . . . . . . . . . . . . . . . . . . . . 47
5.18 EIRQ query . . . . . . . . . . . . . . . . . . . . . . . . . . 47
5.19 EIRQ Result . . . . . . . . . . . . . . . . . . . . . . . . . . 48
vi
List of Tables
6.1 Unit testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
vii
List of Algorithms
5.1 Matrix construct . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

5.2 File filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
viii
Chapter 1
Introduction
1.1 Cloud computing
Cloud computing is the delivery of computing services over the Internet.

Cloud services allow individuals and businesses to use software and hard-
ware that are managed by third parties at remote locations. Examples of
cloud services include online file storage, social networking sites, webmail,
and online business applications. The cloud computing model allows access
to information and computer resources from anywhere that a network con-
nection is available. Cloud computing provides a shared pool of resources,
including data storage space, networks, computer processing power, and
specialized corporate and user applications.
The following definition of cloud computing has been developed by the
U.S. National Institute of Standards and Technology (NIST)[2]: Cloud com-
puting is a model for enabling convenient, on-demand network access to a
shared pool of configurable computing resources (e.g., networks, servers,
storage, applications, and services) that can be rapidly provisioned and re-
leased with minimal management effort or service provider interaction. This
cloud model promotes availability and is composed of five essential charac-
teristics, three service models, and four deployment models.
1
Efficient Information Retrieval in Cloud Environment with Privacy Preserving
1.1.1 What is cloud?
The "cloud" is a set of different types of hardware and software that work
collectively to deliver many aspects of computing to the end-user as an on-
line service.
1.1.2 What is cloud computing?
Cloud Computing is the use of hardware and software to deliver a service

over a network (typically the Internet). With cloud computing, users can
access files and use applications from any device that can access internet.
An example of a Cloud Computing provider is Googles Gmail. Gmail users
can access files and applications hosted by Google via the internet from any
device.
Figure 1.1: Cloud clomputing Architecture
1.1.3 Classification of services provided by cloud
Based upon the services offered, clouds are classified in the following ways:
Infrastructure as a service (IaaS) involves offering hardware related
services using the principles of cloud computing. These could include
some kind of storage services (database or disk storage) or virtual
servers. Leading vendors that provide Infrastructure as a service are
Amazon EC2, Amazon S3, Rackspace Cloud Servers and Flexiscale.
Platform as a Service (PaaS) involves offering a development platform

on the cloud. Platforms provided by different vendors are typically not
compatible. Typical players in PaaS are Googles Application Engine,
Microsofts Azure.
Software as a service (SaaS) includes a complete software offering

on the cloud. Users can access a software application hosted by the
Dept. of Computer Science & Engg,KLE Dr.MSSCET, Belgaum 2

cloud vendor on pay-per-use basis. This is a well-established sector.

The pioneer in this field has been Salesforce.coms offering in the on-
line Customer Relationship Management (CRM) space. Other exam-
ples are online email providers like Googles Gmail and Microsofts
Hotmail, Google docs and Microsofts online version of office called
BPOS (Business Productivity Online Standard Suite). The above clas-
sification is well accepted in the industry. David Linthicum describes
a more granular classification on the basis of service provided.
These are listed below:
Storage-as-a-service
Database-as-a-service
Information-as-a-service
Process-as-a-service
Application-as-a-service
Platform-as-a-service
Integration-as-a-service
Security-as-a-service
Management/Governance-as-a-service
Testing-as-a-service
Infrastructure-as-a-service
Figure 1.2: Classification of services provided by cloud

1.1.4 Types of clouds
Public cloud: In Public cloud the computing infrastructure is hosted by

the cloud vendor at the vendors premises. The customer has no vis-
ibility and control over where the computing infrastructure is hosted.
The computing infrastructure is shared between any organizations.
Private cloud: The computing infrastructure is dedicated to a particular

organization and not shared with other organizations. Some experts
consider that private clouds are not real examples of cloud computing.
Private clouds are more expensive and more secure when compared to
public clouds.
Private clouds are of two types: On-premise private clouds and exter-
nally hosted private clouds. Externally hosted private clouds are also
exclusively used by one organization, but are hosted by a third party
specializing in cloud infrastructure. Externally hosted private clouds
are cheaper than On-premise private clouds.
Hybrid cloud: Organizations may host critical applications on private

clouds and applications with relatively less security concerns on the
public cloud. The usage of both private and public clouds together is
called hybrid cloud.
Community cloud: involves sharing of computing infrastructure in be-

tween organizations of the same community. For example all Govern-
ment organizations within the state of California may share computing
infrastructure on the cloud to manage data related to citizens residing
in California.
Figure 1.3: Type of cloud

1.2 Problem Statement
Due to the overwhelming merits of cloud computing, such as scalability

cost-effectiveness, and flexibility, more and more organizations are willing
to outsource their data for storing in the cloud. The benefits of utilizing the
cloud (lower operating costs, elasticity and so on) come with a trade-off.
Users will have to entrust their data to a potentially untrustworthy cloud
provider. As a result, cloud security has become an important problem for
both industry and academia. One important security problem is the potential
privacy leakages that may occur when outsourcing data to the cloud.
When the users want to search for some files, they will send a query to the
cloud with certain keywords. The cloud will evaluate the query and return
the necessary files to the users. During this process, the cloud will know
what files the user is interested in from observing the query and the type of
the files returned to that user. Preventing a leak of this type of information
to the cloud is difficult since the cloud must have access to the information
to efficiently return the appropriate files to the users. Several methods have
been proposed in order to combat this. However an efficient solution is not
present yet. Hence the project proposed focuses on providing user privacy
and to reduce the communication cost while accessing the files from the
cloud.
1.3 Aim and objective
The aim of the project is to make private search applicable in a cloud envi-
ronment that reduces the computational cost and communication costs. The
project also focuses on providing user privacy, user privacy can be divided
into search privacy and access privacy, where the cloud neither learns what
the user is searching for nor which files are returned to a user.

1.4 Overview of project
For instance, let us consider the application scenario as shown in Fig. 1.3.
In the traditional method an organization subscribes the cloud services and
will authorize its staff to save the files in the cloud. Each file while up-
loading will be described by a set of keywords, and the authorized users
can retrieve the interested files by querying the cloud by providing relevant
keywords. Since a cloud is operated by a third party, there have been some
concerns over the possible privacy leaks that may occur. Such concerns
have led researchers to propose various techniques to protect user privacy.
Alternatively, if more than one queries can be combined together, the over-
head by reducing the number of queries that the server has to process can be
saved.
Figure 1.4: Application scenario
Hence to achieve solutions to above problems a proxy server called ag-

gregation and distribution layer (ADL) is being introduced in between the
cloud and the users. The users will send the queries to the ADL first in-
stead of cloud directly and this ADL will query the cloud on behalf of users.
Thus the cloud needs to execute the aggregated query only once to return
files matching all queries to the ADL. Under the ADL, the computation cost
incurred on the cloud can be largely reduced, since the cloud only needs to
execute a combined query once, no matter how many users are executing
queries. Furthermore, the communication cost incurred on the cloud will
also be reduced, since files shared by the users need to be returned only
once. The users are also allowed to decide personally how many matched
1 FilesF1, F2, and F3 stored in the cloud are described with keywords Wireless sensor network, Cov-
erage, Cloud Computing, Security, and Cryptography, respectively. Alice uses keywords Wireless
sensor networks, Cloud Computing, and Bob uses keywords Cryptography, Security to query data from
cloud.

files he wants to be returned. If the users are allowed to retrieve the matched
files according to their demand the overall bandwidth consumed while ac-
cessing the files from the cloud can be reduced.
1.5 Issues and Challenges
When the users want to search for some files, they will send a query to the
cloud with certain keywords. The cloud will evaluate the query and return
the necessary files to the users. During this process, the cloud will know
what files the user is interested in from observing the query and the type of
the files returned to that user. Preventing a leak of this type of information
to the cloud is difficult since the cloud must have access to the information
to efficiently return the appropriate files to the users.

Chapter 2
Literature Survey
Literature survey is the most important step in software development pro-

cess. Before developing the tool it is necessary to determine the time factor,
economy n company strength. Once these things are satisfied, then next
steps are to determine which operating system and language can be used
for developing the tool. Once the programmers start building the tool the
programmers need lot of external support. This support can be obtained
from senior programmers, from book or from websites. Before building the
system the above consideration are taken into account for developing the
proposed system.
2.1 Previous Research Work
Our work is on protecting user privacy while searching data on untrusted

servers. User privacy can be classified into search privacy and access pri-
vacy [3]. Search privacy means that the servers knows nothing about what
the user are searching for, and access privacy means that the cloud knows
nothing about which files are returned to the user. There has been a lot of
work conducted in this field including private searching[4, 5, 6, 7], private
retrieval information (PIR) [8], and searchable encryption, where user pri-
vacy can be protected in private searching and PIR, but only search privacy
8
can be protected in private searching and PIR, but only search privacy can
be protected in searchable encryption.
Private searching was first proposed by [1, 15], where data is stored in
the clear form, and the query is encrypted with the Paillier cryptosystem[14]
[14] that exhibits the homomorphic properties. Ranked searchable encryp-
tion enables users to retrieve the most matched files from the cloud in the
case that both the query and data are in the encrypted form. The work by
[9], which only supports single-keyword searches, encrypts files and queries
with Order Preserving Symmetric Encryption (OPSE) [10] and utilizes key-
word frequency to rank results. Their following work [11], which supports
multiple-keyword searches, uses the secure KNN technique[12] to rank re-
sults based on inner products. The main limitation of these approaches is
that user access privacy [13] will not be preserved.
2.2 Existing System
A key privacy search solution was proposed by Ostrovsky et al. [1, 15],
which allows a user to retrieve files of interest from an untrusted server
without leaking any information. It provide the same privacy level as down-
loading the entire database from the cloud with significantly less commu-
nication costs. The cloud cannot know which files are really interested by
a user by asking the cloud to return the entire database. However, the Os-
trovsky scheme has a high computational cost, since it requires the cloud
to process the query on every file in a collection. Otherwise, the cloud will
learn that certain files, without processing, are of no interest to the user.
It will quickly become a performance bottleneck when the cloud needs to
process thousands of queries over a collection of hundreds of thousands of
files.

2.3 Proposed System
The proposed scheme is Efficient Information retrieval for Ranked Query

(EIRQ), in which each user can choose the rank of his query to determine
the percentage of matched files to be returned. The basic idea of EIRQ is to
construct a privacy preserving mask matrix that allows the cloud to filter out
a certain percentage of matched files before returning to the ADL. The sys-
tem consists of three types of entities: cloud, aggregation and distribution
layer (ADL), and users. Multiple files are stored in a potentially untrusted
cloud, where each file is described by several distinct keywords. The union
of all keywords form a dictionary. Users will query the ADL using key-
words from the dictionary. The ADL will aggregate user queries and query
the cloud with a combined query. The cloud will return the files matching
the combined query to the ADL. The ADL is assumed to be trusted by all of
the users since it is deployed within the organisation itself, and the commu-
nication channels are assumed to be secured under security protocols like
SSL. Each user individually sends the query to the ADL, which will dis-
tribute appropriate files to each user. As long as the ADL is trusted and
correctly executes our schemes, the user cannot know anything about other
users interests. Thus, the cloud is the only adversary for each user. The
cloud is assumed to be honest but curious. That is, it will obey our schemes,
but still want to know some additional information. To reduce the communi-
cation cost, a different query service is provided to allow each user to select
a particular rank for his query. In general, the higher the rank of the user
query is, the more matched files will be returned to the user. This feature
is useful when there are a large number of files that match a users query,
but the user only needs a small subset of them. We can illustrate this using
the following example. Let us assume that Alice wants to retrieve 2% of the
files that contain keywords A, B, and Bob wants to retrieve 20% of the

files that contain keywords A, C. Suppose that the cloud holds 500 files
described by keywords A, B and 500 files described by keywords A, C.
Without combination, the cloud will have to return 2000 files, and without
ranking, the cloud will have to return 1000 files, but only 110 of which are
actually needed.

Chapter 3
Software Requrirement Specification
A Software Requirements Specification (SRS) is a complete description of

the behaviour of the system to be developed. It includes the functional and
non-functional requirement for the software to be developed. The func-
tional requirement includes what the software should do and non-functional
requirement define how the system is supposed to be. Requirements must
be measurable, testable, related to identified needs or opportunities, and
defined to a level of detail sufficient for system design. Requirement speci-
fication adds more information to requirement definition. The requirement
specification is usually presented with the system models developed dur-
ing the requirement analysis. The specification plus the model should de-
scribe the system to be designed and implemented. After the completion of
analysis phase requirements must be written or specified. The final output
is the software requirement specification (SRS) document. The different
desirable characteristics of a SRS are understandable, unambiguous, com-
pleteness, verifiable, consistency, modifiable and traceable. The goal of the
requirement analysis phase is to produce the software requirements docu-
ment. This document describes what the proposed system should do with-
out describing how the software will do it. This also provides a reference
for validation of the final product. The writing of software requirement
specification reduces development effort, as careful review of the document
12
can reveal omissions, misunderstandings, and inconsistencies early in the

development cycle when these problems are easier to correct.
3.1 Overall Description
A Software Requirements Specification (SRS) is a complete description of

the behaviour of the system to be developed. It includes a set of use cases
that describe all the interactions that the users will have with the software.
Use cases are also known as functional requirements. In addition to use
cases, the SRS also contains non-functional (or supplementary) require-
ments. Non-functional requirements are the requirements which impose
constraints on the design or implementation (such as performance engineer-
ing requirements, quality standards or design constraints)
3.1.1 Product Perspective
The project work is basically a java based application intended to highlight

uploading of files to cloud along with the keyword, searching the files from
the cloud with the relevant keyword with the help of ADL, the addition or
the deletion of user by the administrator. The project framework is designed
in Dreamweaver which provides the framework with various rich features
of user interface. In this dissertation, we show that consistency and extensi-
bility increases for code reusability or searching of files. The product design
is also focused on making the model for consistency maintenance, usability,
testability and portability. Portability is ensured with the help of object-
oriented programming language which is java. Finally the project work is
aimed for maintaining the e-documents in the organized manner.

3.1.2 Product Functions
The Efficient Information Retrieval in Cloud Environment with Privacy Pre-

serving addresses two fundamental issues in cloud environment: privacy
and efficiency The System being developed should be user friendly and re-
liable software for the above purpose. The primary function of the project is
to improve communication cost and computational cost among the users and
the cloud along with providing privacy. The access privacy is provided to
the users in an organisation who access the files from cloud. The efficiency
is being improved in terms of communication cost and computational cost
with regards to previous technologies.
3.1.3 User Characteristics Administrator
In this module, administrator is given a special privilege to do the following

tasks which other users do not possess
Uploading the data file into server node along with the keyword (phys-
ical storage or cloud).
Creating a new user.
Modifications to the documents User He is a member of the organiza-

tion and is given privilege to do the following activities.
Search the required documents with keywords
Specify the rank to fetch the required percentage of matched files. The
searching activity is commonly performed by both the admin and the
user.

3.1.4 Pre-requisites
The user should have installed one of the operating system viz. Windows,
Linux or Mac on one of the systems with hardware specification as men-
tioned in the subsequent section.
User should possess minimum knowledge about how to host the appli-
cation in the web browser.
Should be familiar with the concept of searching and other functional-

ities.
Should be aware of the type of input that has to be given to the appli-
cation.
There must be reliable internet connection to communicate with the

cloud service
3.1.5 Assumptions and Dependencies
The following assumptions are being made in the development of our project:
The data file must be uploaded on the server side by the administrator
only along with the appropriate keyword. The project work also assumes
the server never goes down during the uploading process maintaining the
same availability. The following are the dependencies of our project: This
project is code in Java and the application requires proper Eclipse frame-
work/environment installation. So anyone who wishes to work on fur-
ther development of this project should know this programming language.
Proper configuration of the network has to be done before executing the
application.

3.2 Specific Requirements
This section of the SRS should contain all the software requirements to a
level of detail sufficient to enable designers to design a system to satisfy
those requirements. It also helps tester to design their test case to verify
whether system satisfies the specified requirements.
3.2.1 Functional Requirement

3.2.1.1 Purpose
In this application, forms for login page, search page etc. are created
for users and administrator to give a layer of security.
Buttons and text fields are provided in order to take the input from the
user.
Error messages are provided for invalid inputs and separate alert mes-
sages for the display of successful outcomes like creation or deletion
of users.
For the administrator, interface must be created for uploading, modify-

ing and viewing of files.
For the normal user, searching the files and downloading them from
the cloud are allowed by giving the appropriate key word.
To upload the data file, the scroll box must be provided in order to
select the required file from the client machine.
The ADL takes the request from multiple users and merges them as a
single query and send it to cloud.
The cloud after filtering the file will return it to the ADL who will
distribute among the users as their ranks. Eclipse provides classes and

libraries to implement the above requirements.
3.2.1.2 Process
A step by step approach is followed to develop the web application. Intake

of the requirements might happen in the any phase of project as and when
the detailed communication happens with the client of the organization. A
dummy model is developed in order to check for the basic requirements.
And again the prototyped model is checked for the new requirements, if the
model fails to meet the requirements, that phase of the project is corrected
then and there. Hence testability is of the higher order, chances of erroneous
condition is minimum. And this approach is commonly called as Iterative
approach.
3.2.2 Non Functional Requirement
Scalability: The present system is designed to process huge amount of

data.
Usability: An attempt is made to make the GUI user interactive and

user friendly.
Security: Bifurcation of users into organization user and administrator

with the creation of a login id and password.
Portability: The system is designed to be the cross platform support-

able. The system is supported on the wide range of hardware and any
software platform.
Testability: It is of the higher order since iterative approach is used.
Consistency and Extensibility: The codes are be reused and can be

extended to implement other functionalities.

3.2.3 Software Requirements
This service is portable and can run on any system having a minimum hard-
ware configuration as mentioned:
Operating System : Windows XP/7/8
Application Server : Tomcat 6..0/7.0.X
Front End : HTML5, Java, JSP
Scripts : JavaScript.
Server side Script : Java Server Pages.
Database : MySQL
Database Connectivity : JDBC.
3.2.4 Hardware Requirements
The configuration of PC should be as follows:
Speed - 1.1 Ghz
RAM - 256 MB(min)
Hard Disk - 20 GB
Key Board - Standard Windows Keyboard
Mouse - Two or Three Button Mouse

Chapter 4
System Design
System design is a process through which requirements are translated into a

representation of the software. Initially the representation depicts a holistic
view of software. System design is concerned with how the system func-
tionality is to be provided by the different components of the system. It
leads to a design representation that is very close to the source code. The
importance of the system design can be started with a single word Qual-
ity. Design is the place where quality is fostered in software development. It
provides us with representation of software that can be assessed for quality.
It is the only phase where customer requirements are accurately translated
into finished software product or system. Design is a creative process that
requires insight of the system. The design process involves developing sev-
eral models of the system at different levels of abstraction. It includes the
design of algorithms, modules, components and subsystems.
As a design is decomposed errors and omissions in the earlier stage are
discovered. These feedbacks allow the earlier design of the model to be im-
proved. It is the first step to moving from the problem domain towards the
solution domain. The system is essentially the bridge between the require-
ment specification and the final solution for satisfying the requirements.
Since our system is based on cloud, can be accessed anywhere using In-
ternet.
19
4.1 Design Considerations
Design represents the development phase for any engineering product or

system. It may be defined as the process of applying various techniques and
principles for the purpose of defining, the process or a system in sufficient
detail to permit its physical relation. The design activity begins when the
requirements document for the software to be developed is available. While
the requirement specification activity is entirely in the problem domain, de-
sign is the first step in moving from the problem domain to the solution
domain. The goal of the design process is to produce a model or representa-
tion of a system, which can be used later to build that system. This section
describes many of the issues which are needed to be addressed or resolved
before attempting to devise a complete design solution.
4.2 Development Method
The project work is using the waterfall lifecycle model for the development
of the project. The waterfall model is an activity centered lifecycle model.
The approach of the waterfall model is in a step-by-step way where all the
requirements of one activity are completed before the design of the activity
is started. The entire project design is broken down into several small tasks
in order of precedence and these tasks are designed one by one making sure
they work perfectly. Once one of these small tasks is completed another
task, which is dependent on the completed task, can be started. Each step
after being completed is verified to ensure the task is working, error-free and
meeting all the requirements. The project work chose this lifecycle model
for the project primarily for two reasons. First reason being simplicity, by
using the waterfall model the entire project can be broken down into smaller
activities which can be converted relatively easily into code and once the en-
tire thing is combined the code for the project can be derived. The second

reason is because of the verification step required by the waterfall model it

would be ensured that a task is error free before other tasks that are depen-
dent on it are developed. Thus chances of an error remaining somewhere
high up in the task hierarchy are relatively low.
Some of the unique features of waterfall model are
It can be implemented for all size projects.
It leads to a concrete and clear approach to software development.
In this model testing is inherent in every phase.
Documentation is produced at every stage of model which is very help-
ful for people who are involved.
The model consists of following distinct stages, namely:
Requirement Analysis & Definition: In this stage the problem is spec-

ified along with the desired service objectives (goals) and the con-
straints are identified. In this phase the requirements were gathered
for the enhancement project.
System & Software Design: In this stage the system specifications are
translated into a software representation. The software engineer at this
stage is concerned with data structure, software architecture and inter-
face representations. In this phase System and Software Design was
carried out. The system architecture was decided and Class Diagrams
and Sequence Diagrams were drawn.
Implementation: In this stage the designs are translated into the soft-
ware domain. In this phase the actual coding of the project was done.
Unit, Integration & System Testing: Testing at this stage focuses on

making sure that any errors are identified and that the software meets
its required specification. After this stage the software is delivered to
the customer. In this phase the Unit, Integration and System testing
were carried out.

Operations & Maintenance: In this phase the software is updated to

meet the changing customer needs, adapted to accommodate changes
in the external environment, correct errors and oversights previously
undetected in the testing phases, enhancing the efficiency of the soft-
ware. This is the last phase employed by project.
4.3 System Architecture
Large systems are always decomposed into sub-systems that provide some
related set of services. The initial design process of identifying these sub-
systems and establishing a framework for sub-system control and communi-
cation is called architecture design and the output of this design process is a
description of the software architecture. The architectural design process is
concerned with establishing a basic structural framework for a system. It in-
volves identifying the major components of the system and communications
between these components. In the following sub-sections we explore into
the design aspects and the sub systems involved in this software package.
Figure 4.1: System architecture
4.4 Flow charts
A flowchart is a type of diagram that represents an algorithm, workflow

or process, showing the steps as boxes of various kinds, and their order
by connecting them with arrows. This diagrammatic representation illus-
trates a solution to a given problem. Flowcharts are used in analysing, de-
signing, documenting or managing a process or program in various fields.
Flowcharts are used in designing and documenting complex processes or
programs. Like other types of diagrams, they help visualize what is going
on and thereby help the people to understand a process, and perhaps also

find flaws, bottlenecks, and other less-obvious features within it. There are
many different types of flowcharts, and each type has its own repertoire of
boxes and notational conventions.
The two most common types of boxes in a flowchart are:
a processing step, usually called activity, and denoted as a rectangular
box
a decision, usually denoted as a diamond.
A flowchart is described as "cross-functional" when the page is divided
into different swim-lanes describing the control of different organizational
units. A symbol appearing in a particular "lane" is within the control of that
organizational unit. This technique allows the author to locate the respon-
sibility for performing an action or making a decision correctly, showing
the responsibility of each organizational unit for different parts of a single
process. Flowcharts depict certain aspects of processes and they are usually
complemented by other types of diagram. For instance, Kaoru Ishikawa de-
fined the flowchart as one of the seven basic tools of quality control, next
to the histogram, Pareto chart, check sheet, control chart, cause-and-effect
diagram, and the scatter diagram.
4.4.1 Flowchart for User
Figure 4.2: Flowchart for user
4.4.2 Flowchart for ADL
Figure 4.3: Flowchart for ADL
4.4.3 Flowchart for Admin
Figure 4.4: Flowchart for Admin

4.4.4 Flowchart for Cloud
Figure 4.5: Flowchart for Cloud
4.5 Data Flow Diagram
A data flow diagram (DFD) is a graphical representation of the "flow" of

data through an information system, modelling its process aspects. Often
they are a preliminary step used to create an overview of the system which
can later be elaborated. DFDs can also be used for the visualization of data
processing (structured design). A DFD shows what kind of information
will be input to and output from the system, where the data will come from
and go to, and where the data will be stored. It does not show information
about the timing of processes, or information about whether processes will
operate in sequence or in parallel. A data-flow diagram (DFD) is a graphical
representation of the "flow" of data through an information system. DFDs
can also be used for the visualization of data processing (structured design).
On a DFD, data items flow from an external data source or an internal data
store to an internal data store or an external data sink, via an internal process.
A DFD provides no information about the timing or ordering of processes,
or about whether processes will operate in sequence or in parallel. It is
therefore quite different from a flowchart, which shows the flow of control
through an algorithm, allowing a reader to determine what operations will
be performed, in what order, and under what circumstances, but not what
kinds of data will be input to and output from the system, nor where the data
will come from and go to, nowhere the data will be stored (all of which are
shown on a DFD). When it comes to conveying how information data flows
through systems (and how that data is transformed in the process), data flow
diagrams (DFDs) are the method of choice over technical descriptions for
three principal reasons.

1. DFDs are easier to understand by technical and nontechnical audi-

ences.
2. DFDs can provide a high level system overview, complete with bound-
aries and connections to other systems.
3. DFDs can provide a detailed representation of system components.
DFDs help system designers and others during initial analysis stages visual-
ize a current system or one that may be necessary to meet new requirements.
Systems analysts prefer working with DFDs, particularly when they require
a clear understanding of the boundary between existing systems and postu-
lated systems. DFDs represent the following:
1. External devices sending and receiving data
2. Processes that change that data
3. Data flows themselves
4. Data storage locations
It is common practice to draw the context-level data flow diagram first,

which shows the interaction between the system and external agents which
act as data sources and data sinks. This helps to create an accurate drawing
in the context diagram. The systems interactions with the outside world
are modelled purely in terms of data flows across the system boundary. The
context diagram shows the entire system as a single process, and gives no
clues as to its internal organization. This context-level DFD is next "ex-
ploded", to produce a Level 1 DFD that shows some of the detail of the
system being modelled. The Level 1 DFD shows how the system is divided
into sub-systems (processes), each of which deals with one or more of the
data flows to or from an external agent, and which together provide all of the
functionality of the system as a whole. It also identifies internal data stores

that must be present in order for the system to do its job, and shows the flow
of data between the various parts of the system. Data flow diagrams were
proposed by Larry Constantine, the original developer of structured design,
based on Martin and Estrins "data flow graph" model of computation. Data
flow diagrams are one of the three essential perspectives of the structured-
systems analysis and design method SSADM. The sponsor of a project and
the end users will need to be briefed and consulted throughout all stages of
a systems evolution. With a data flow diagram, users are able to visualize
how the system will operate, what the system will accomplish, and how the
system will be implemented. The old systems dataflow diagrams can be
drawn up and compared with the new systems data flow diagrams to draw
comparisons to implement a more efficient system. Data flow diagrams can
be used to provide the end user with a physical idea of where the data they
input ultimately has an effect upon the structure of the whole system from
order to dispatch to report. How any system is developed can be determined
through a data flow diagram model. In the course of developing a set of
levelled data flow diagrams the analyst/designers is forced to address how
the system may be decomposed into component sub-systems, and to iden-
tify the transaction data in the data model. Data flow diagrams can be used
in both Analysis and Design phase of the SDLC
4.6 Use case Diagram
A use case diagram in the Unified Modelling Language (UML) is a type

of behavioural diagram defined by and created from a Use-case analysis.
Its purpose is to present a graphical overview of the functionality provided
by a system in terms of actors, their goals (represented as use cases), and
any dependencies between those use cases. The main purpose of a use case
diagram is to show what system functions are performed for which actor.

Roles of the actors in the system can be depicted. Use Case diagrams are
formally included in two modelling languages defined by the OMG: the
Unified Modelling Language (UML) and the Systems Modelling Language
(SysML). Diagram building blocks:
Use cases - A use case describes a sequence of actions that provide

something of measurable value to an actor and is drawn as a horizontal
ellipse.
Actors - An actor is a person, organization, or external system that

plays a role in one or more interactions with the system.
4.6.1 Use case Diagram for Users
Figure 4.6: Use case for User
Use Case: User

Summary: To generate query and download files.
Actor: User
Precondition: User should open the application and provide the appro-
priate keyword while querying a file.
Description: The user come across the query screen and he will be asked
to given the keywords along with the rank that tells the percentage of matched
files to be retrieved. If the keyword is wrong exception will be thrown.
4.6.2 Use case Diagram for ADL
Figure 4.7: Usecase for ADL
Use Case: ADL.

Summary: To send query to cloud and distribute results among users.

Actor: ADL
Precondition: ADL should have the keywords.
Description: The ADL will wait for the keywords for certain period of
time and mergers all the queries from different users into a single query and
send it to cloud. After receiving the merged result from cloud the ADL will
distribute the result among the users according their keywords and rank.
4.6.3 Use case Diagram for Administrator
Figure 4.8: Use case for Administrator
Use Case: Admin

Summary: To upload the files to the cloud.
Actor: Administrator
Precondition: The admin should have the files that is to be uploaded to
the cloud.
Description: The user come across the upload screen where he can up-
load the files to the cloud along with the appropriate keywords for that file.
The admin can also view the uploaded files details and the registered users
details.
4.6.4 Use case Diagram for Cloud
Figure 4.9: Use case for Cloud
Use Case: Cloud

Summary: To view file details and query details.
Actor: Cloud
Precondition: Cloud should have the files already uploaded in it.

Description: The cloud monitors the files uploaded and the user queries.
Based on the user query the cloud fetches the files from the cloud and send
it to ADL.
4.7 Sequence Diagram
UML sequence diagrams are used to represent or model the flow of mes-
sages, events and actions between the objects or components of a system.
Time is represented in the vertical direction showing the sequence of inter-
actions of the header elements, which are displayed horizontally at the top
of the diagram. Sequence Diagrams are used primarily to design, document
and validate the architecture, interfaces and logic of the system by describ-
ing the sequence of actions that need to be performed to complete a task
or scenario. UML sequence diagrams are useful design tools because they
provide a dynamic view of the system behaviour, which can be difficult to
extract from static diagrams or specifications. A sequence diagram is an
interaction diagram that shows how processes operate with one another and
in what order. It is a construct of a Message Sequence Chart. A sequence
diagram shows object interactions arranged in time sequence. It depicts the
objects and classes involved in the scenario and the sequence of messages
exchanged between the objects needed to carry out the functionality of the
scenario. Sequence diagrams are typically associated with use case real-
izations in the Logical View of the system under development. Sequence
diagrams are sometimes called event diagrams, event scenarios A sequence
diagram shows, as parallel vertical lines (lifelines), different processes or
objects that live simultaneously, and, as horizontal arrows, the messages ex-
changed between them, in the order in which they occur. This allows the
specification of simple runtime scenarios in a graphical manner. If the life-
line is that of an object, it demonstrates a role. Note that leaving the instance

name blank can represent anonymous and unnamed instances. Messages,

written with horizontal arrows with the message name written above them,
display interaction. Solid arrow heads represent synchronous calls, open
arrow heads represent asynchronous messages, and dashed lines represent
reply messages. If a caller sends a synchronous message, it must wait until
the message is done, such as invoking a subroutine. If a caller sends an asyn-
chronous message, it can continue processing and doesnt have to wait for a
response. Asynchronous calls are present in multithreaded applications and
in message-oriented middleware. Activation boxes, or method-call boxes,
are opaque rectangles drawn on top of lifelines to represent that processes
are being performed in response to the message (ExecutionSpecifications in
UML). Objects calling methods on themselves use messages and add new
activation boxes on top of any others to indicate a further level of process-
ing. When an object is destroyed (removed from memory), an X is drawn
on top of the lifeline, and the dashed line ceases to be drawn below it (this
is not the case in the first example though). It should be the result of a mes-
sage, either from the object itself, or another. A message sent from outside
the diagram can be represented by a message originating from a filled-in
circle (found message in UML) or from a border of the sequence diagram
(gate in UML).
Figure 4.10: Sequence diagram

Chapter 5
Implementation
Implementation is the stage of the project when the theoretical design is

turned out into a working system. Thus it can be considered to be the most
critical stage in achieving a successful new system and in giving the user,
confidence that the new system will work and be effective. The implemen-
tation stage involves careful planning, investigation of the existing system
and its constraints on implementation, designing of methods to achieve
changeover and evaluation of changeover methods.
5.1 Implementation Requirements
Implementation of any software is always preceded by important decisions

regarding selection of the platform, the language used, etc. these decisions
are often influenced by several factors such as real environment in which the
system works, the speed that is required and other implementation specific
details. There are two major implementation decisions that have been made
before the implementation of this project. They are as follows: 1. Selection
of the programming language for development of the application 2. The
development platform chosen is Windows 7(64 bit), Eclipse Kepler 4.3.2,
and JDK/ JRE 7.
31
5.2 Selection of the Platform
The project chose Windows 7 Ultimate for development. JDK/JRE 7 is

chosen for the project as they are stable. Eclipse Kepler 4.3.2 is chosen for
the project as they are stable.
5.3 Selection of Language
For the implementation of project Java was chosen as the existing code was
in Java. Java offers numerous advantages as briefed in the section below.
5.3.1 Java Technology
Javas growth over the last 10 years has been nothing short of phenomena.
Java technology is a high-level programming and a platform independent
language Java is a well-known technology which allows for software de-
signed and written only once for an "virtual machine" to run on a different
computers, supports various Operating System like Windows PCs, Macin-
toshes, and Unix computers. All source code is written in text files (Notepad
Editor) save with the .java extension in the Java programming language.
The source files are compiled into .class files by the java compiler. A .class
file contains byte codes, the machine language of the Java Virtual Machine
(JVM). The java launcher tool runs application with an instance of the Java
Virtual Machine.
5.3.1.1 How Java Technology Works
The diagram below depicts how the Java technology works. Source code
is compiled into byte code using the compiler, which is stored on the disk.
The byte code is input to the Java Runtime Environment (JRE).Byte code
verifier verifies the byte code and then the byte code is given to the class

loader. Just In Time (JIT) comes into picture for repeated code optimization
and output of the JIT is native code for the native operating system.
Figure 5.1: Working of java
Java is a high-level programming language and powerful software plat-

form. On full implementation of the Java platform gives the following fea-
tures
JDK Tools: The JDK tools provide compiling, Interpreter, running,

monitoring, debugging, and documenting applications. The main tools
used are the Javac compiler, the java launcher, and the javadoc docu-
mentation tool.
Application Programming Interface (API): The API provides the core

functionality of the Java programming language. It gives a wide collec-
tion of useful classes, which is further used in your own applications.
It provides basic objects and interface to networking and security, to
XML generation and database access, and much more.
Deployment Technologies: The JDK software provides two type of

deployment technology such as the Java Web Start software and Java
Plug-In software for deploying your applications to end users.
Graphical User Interface Toolkits: The Swing and Java 2D toolkits

provide us the feature of Graphical User Interfaces (GUIs).
Integrated Libraries: Integrated with various libraries such as the Java

IDL API, JDBC API, Java Naming and Directory Interface TM ("J.N.D.I.")
API, Java RMI, and Java Remote Method Invocation over Internet
Inter-ORB Protocol Technology (Java RMI-IIOP Technology) enable
database to access and changes of remote objects.

5.3.1.2 How Java Technology Changes Our Life
Easy to Start: Since Java programming language is completely based

on object-oriented language, its easy very simple and easy to learn,
especially for programmers already known with C or C++.
Easy to Write Code: As compared to program metrics (class counts,

method counts, and so on) tell us that a program written in the Java
programming language can be four times smaller as compare to the
same program written in C++.
Write Better Code: The Java programming language encourages good

coding practices, and manages automatic garbage collection which
helps avoid memory leaks. Based on the concept of object orienta-
tion, its Java Beans component architecture, and wide-range, easily
extendible, flexibility and API can reuse existing, tested code and in-
troduce fewer bugs.
Develop Programs and Time Safer: The Java programming language

is easier and simpler than C++, as such, manages development time
up to twice as fast when writing in it. The programs will also require
fewer lines of code.
Platform Independencies: The program keeps portable and platform

independent by avoiding the use of libraries written in other languages.
Write Once and Used in any Java Platform: Any Source code of Pro-
gram are written in the Java programming language, that is compiled
into machine-independent byte codes and run consistently on any plat-
form of java.

5.3.2 JavaScript
JavaScript is a script-based programming language that was developed by

Netscape Communication Corporation. JavaScript was originally called
Live Script and renamed as JavaScript to indicate its relationship with Java.
JavaScript supports the development of both client and server components
of Web-based applications. On the client side, it can be used to write pro-
grams that are executed by a Web browser within the context of a Web page.
On the server side, it can be used to write Web server programs that can pro-
cess information submitted by a Web browser and then update the browsers
display accordingly. Even though JavaScript supports both client and server
Web programming, we prefer JavaScript at Client side programming since
most of the browsers supports it. JavaScript is almost as easy to learn as
HTML, and JavaScript statements can be included in HTML documents by
enclosing the statements between a pair of scripting tags
<Script> ......... </Script>

<Script Language = JavaScript>
JavaScript statements
</Script>
5.3.2.1 Few things we can do with JavaScript
Validate the contents of a form and make calculations.
Add scrolling or changing messages to the Browsers status line.
Animate images or rotate images that change when we move the mouse
over them.
Detect the browser in use and display different content for different
browsers.
Detect installed plug-ins and notify the user if a plug-in is required.

We can do much more with JavaScript, including creating entire appli-

cation.
5.3.2.2 Difference between JavaScript and Java
JavaScript and Java are entirely different languages. A few of the most
glaring differences are Java applets are generally displayed in a box within
the web document; JavaScript can affect any part of the Web document
itself. While JavaScript is best suited to simple applications and adding
interactive features to Web pages; Java can be used for incredibly complex
applications.
5.3.3 JDK
The Java Development Kit (JDK) is an implementation of either one of the

Java SE, Java EE or Java ME platforms released by Oracle Corporation in
the form of a binary product aimed at Java developers on Solaris, Linux,
Mac OS X orWindows. Since the introduction of Java platform, it has been
by far the most widely used Software Development Kit (SDK). The JDK
forms an extended subset of a software development kit (SDK). In the de-
scriptions that accompany its recent releases, which implement Java SE, EE
and ME, Sun acknowledges that under its terminology, the JDK forms the
subset of the SDK which has the responsibility for the writing and running
of Java programs.
5.3.4 MySQL
MySQL is the worlds most widely use open source relational database man-
agement system (RDBMS) that runs as a server providing multi-user access
to a number of databases. The SQL phrase stands for Structured Query
Language. MySQL is a popular choice of database for use in web appli-

cations, and is a central component of the widely used LAMP open source
web application software stack .LAMP is an acronym for "Linux, Apache,
MySQL, Perl/PHP/Python." Free-software-open source projects that require
a full-featured database management system often use MySQL. A database
is a structure that comes in two flavours: a flat database and a relational
database. A relational database is much more oriented to the human mind
and is often preferred over the gabble-de-gook flat database that are just
stored on hard drives like a text file. MySQL is a relational database.
Databases are most useful when it comes to storing information that fits into
logical categories. For example, say that you wanted to store information
of all the employees in a company. With a database you can group different
parts of your business into separate tables to help store your information log-
ically. Example tables might be: Employees, Supervisors, and Customers.
Each table would then contain columns specific to these three areas. To help
store information related to each employee, the Employees table might have
the following columns: Hire, Date, Position, Age, and Salary.
5.3.4.1 Distinct Features of MySQL
Portability: The MySQL RDBMS is available on wide range of plat-

forms ranging from PCs to super computers and as a multi user load-
able module for Novel NetWare, if one application is developed on a
system, the same application can be run on other systems without any
modifications.
Compatibility: MySQL commands can be used for communicating

with IBM DB2 mainframe RDBMS that is different from MySQL that
is MySQL compatible with DB2. MySQL RDBMS is a high perfor-
mance fault tolerant DBMS, which is specially designed for online
transaction processing and for handling large database applications.

Multithreaded server architecture: MySQL adaptable multithreaded

server architecture delivers scalable high performance for very large
number of users on all hardware architecture including symmetric mul-
tiprocessors (sumps) and loosely coupled multiprocessors. Performance
is achieved by eliminating CPU, I/O, memory and operating system
bottlenecks and by optimizing the MySQL DBMS server code to elim-
inate all internal bottlenecks.
5.3.4.2 FEATURES OF MYSQL
Most popular RDBMS in the market because of its ease of use
Client/server architecture.
Data independence.
Ensuring data integrity and data security.
Managing data concurrency.
Parallel processing support for speed up data entry and online transac-
tion processing used for applications.
DB procedures, functions and packages.
5.3.5 HMTL5
HTML5 is a core technology mark-up language of the Internet used for

structuring and presenting content for the World Wide Web. It is the fifth
revision of the HTML standard (created in 1990 and standardized as HTML
4 as of 1997) and, as of December 2012, is a candidate recommendation
of the World Wide Web Consortium (W3C). Its core aims have been to
improve the language with support for the latest multimedia while keeping
it easily readable by humans and consistently understood by computers and

devices (web browsers, parsers, etc.). HTML5 is intended to subsume not

only HTML 4, but also XHTML 1 and DOM Level 2 HTML.
Following its immediate predecessors HTML 4.01 and XHTML 1.1,
HTML5 is a response to the fact that the HTML and XHTML in common
use on the World Wide Web are a mixture of features introduced by vari-
ous specifications, along with those introduced by software products such
as web browsers, those established by common practice, and the many syn-
tax errors in existing web documents. It is also an attempt to define a single
mark-up language that can be written in either HTML or XHTML syntax.
It includes detailed processing models to encourage more interoperable im-
plementations; it extends, improves and rationalizes the markup available
for documents, and introduces markup and application programming inter-
faces (APIs) for complex web applications. For the same reasons, HTML5
is also a potential candidate for cross-platform mobile applications. Many
features of HTML5 have been built with the consideration of being able to
run on low-powered devices such as smartphones and tablets. In December
2011, research firm Strategy Analytics forecast sales of HTML5 compatible
phones would top 1 billion in 2013.
In particular, HTML5 adds many new syntactic features. These include
the new <video>, <audio> and <canvas> elements, as well as the integration
of scalable vector graphics (SVG) content (that replaces the uses of generic
<object> tags) and MathML for mathematical formulas. These features are
designed to make it easy to include and handle multimedia and graphical
content on the web without having to resort to proprietary plugins and APIs.
Other new elements, such as <section>, <article>, <header> and <nav>, are
designed to enrich the semantic content of documents. New attributes have
been introduced for the same purpose, while some elements and attributes
have been removed. Some elements, such as <a>, <cite> and <menu> have
been changed, redefined or standardized. The APIs and Document Object

Model (DOM) are no longer afterthoughts, but are fundamental parts of the
HTML5 specification. HTML5 also defines in some detail the required pro-
cessing for invalid documents so that syntax errors will be treated uniformly
by all conforming browsers and other user agents.
5.3.5.1 Features
Markup
HTML5 introduces elements and attributes that reflect typical usage on

modern websites. Some of them are semantic replacements for common
uses of generic block (<div>) and inline (<span>) elements, for example
<nav> (website navigation block), <footer> (usually referring to bottom
of web page or to last lines of HTML code), or <audio> and<video> in-
stead of <object>. Some deprecated elements from HTML 4.01 have been
dropped, including purely presentational elements such as <font> and <cen-
ter>, whose effects have long been superseded by the more capable Cascad-
ing Style Sheets. There is also a renewed emphasis on the importance of
DOM scripting (e.g., JavaScript) in Web behavior. The HTML5 syntax is no
longer based on SGML despite the similarity of its markup. It has, however,
been designed to be backward compatible with common parsing of older
versions of HTML. It comes with a new introductory line that looks like an
SGML document type declaration, <!DOCTYPE html>, which triggers the
standards-compliant rendering mode. As of 5 January 2009, HTML5 also
includes Web Forms 2.0, a previously separate WHATWG specification.
New APIs In addition to specifying markup, HTML5 specifies script-

ing application programming interfaces (APIs) that can be used with
JavaScript. Existing document object model (DOM) interfaces are ex-
tended and de facto features documented. There are also new APIs,
such as:

The canvas element for immediate mode 2D drawing. See Canvas 2D

API Specification 1.0 specification
Timed media playback
Offline Web Applications
Document editing
Drag-and-drop
Cross-document messaging
Browser history management
MIME type and protocol handler registration
Micro data
Web Storage, a key-value pair storage framework that provides be-

haviour similar to cookies but with larger storage capacity and im-
proved API.
Not all of the above technologies are included in the W3C HTML5 spec-
ification, though they are in the WHATWG HTML specification. Some
related technologies, which are not part of either the W3C HTML5 or the
WHATWG HTML specification, are as follows. The W3C publishes speci-
fications for these separately:
Geolocation
Web SQL Database, a local SQL Database (no longer maintained).
The Indexed Database API, an indexed hierarchical key-value store

(formerly WebSimpleDB).
HTML5 File API, handles file uploads and file manipulation.

Directories and System, an API intended to satisfy client-side-storage

use cases not well served by databases.
File Writer, an API for writing to files from web applications.
Web Audio API, a high-level JavaScript API for processing and syn-
thesizing audio in web applications.
ClassList API HTML5 cannot provide animation within web pages.

Additional JavaScript or CSS3 functionality is necessary for animat-
ing HTML elements. Animation is also possible using JavaScript and
HTML 4, and within SVG elements through SMIL, although browser
support of the latter remains uneven as of 2011.
5.4 Overview of Module Description
5.4.1 Efficient Information Retrieval for Ranked Query:
Efficient Information retrieval for Ranked Query (EIRQ) allows each user
can choose the rank of his query to determine the percentage of matched
files to be returned. The basic idea of EIRQ is to construct a privacy pre-
serving mask matrix that allows the cloud to filter out a certain percentage of
matched files before returning to the ADL. This is not a trivial work, since
the cloud needs to correctly filter out files according to the rank of queries
without knowing anything about user privacy.
5.4.2 Aggregation and Distribution Layer:
An ADL is deployed in an organization that authorizes its staff to share

data in the cloud. The staff members, as the authorized users, send their
queries to the ADL, which will aggregate user queries and send a combined
query to the cloud. Then, the cloud processes the combined query on the file

collection and returns a buffer that contains all of matched files to the ADL,
which will distribute the search results to each user. To aggregate sufficient
queries, the organization may require the ADL to wait for a period of time
before running the schemes, which may incur a certain querying delay. The
ADL is assumed to be trusted by all of the users since it is deployed within
the organization itself, and the communication channels are assumed to be
secured under security protocols like SSL. Each user individually sends the
query to the ADL, which will distribute appropriate files to each user. As
long as the ADL is trusted and correctly executes our schemes, the user
cannot know anything about other users interests
5.4.3 Ranked Queries:
To further reduce the communication cost, a differential query service is

provided by allowing each user to retrieve matched files on demand. Specif-
ically, a user selects a particular rank for his query to determine the percent-
age of matched files to be returned. In general, the higher the rank of the user
query is, the more matched files will be returned to the user. This feature
is useful when there are a large number of files that match a users query,
but the user only needs a small subset of them. We can illustrate this using
the following example. Let us assume that Alice wants to retrieve 2% of the
files that contain keywords A, B, and Bob wants to retrieve 20% of the
files that contain keywords A, C. Suppose that the cloud holds 500 files
described by keywords A, B and 500 files described by keywords A, C.
Without combination, the cloud will have to return 2000 files, and without
ranking, the cloud will have to return 1000 files, but only 110 of which are
actually needed.

5.4.4 User Privacy
User privacy can be divided into search privacy and access privacy, where
the cloud neither learns what the user is searching for nor which files are
returned to a user.
Search privacy: In EIRQ, the combined query (the mask matrix) from
the ADL to the cloud is encrypted with the ADLs public key. There-
fore, the cloud cannot deduce what each user is searching for from the
encrypted query.
Access privacy: In EIRQ, the cloud processes each file similarly to

generate a compact buffer where unmatched files are encrypted to 0,
while conducting searches. The buffer returned to the ADL is en-
crypted with the ADLs public key. Therefore, the cloud cannot know
which files are actually returned from the encrypted buffer.
Rank privacy: In EIRQ, the mask matrix from the ADL to the cloud
is a d-row and r-column matrix, where r is the information that is the
information that we leak more than [1]. Given r, the cloud only knows
that all users are classified into r ranks without knowing how many
users are in each rank, nor which users are in which ranks. Therefore,
user rank privacy is protected.
5.5 Working of modules
The system overview provides the actual view of the application in the real
time environment. Below figure shows the system overview. As depicted
in the figure the user generates a query and send it to ADL along with the
required rank that will fetch the required percentage of matched files. The
ADL receives request from multiple users and mergers all the queries into a
single query and send it to cloud. The cloud will filter the required amount

of files from the matched files and send the result back to ADL. The ADL
will distribute the relevant files to the users.
Figure 5.2: Working process of EIRQ
1. Step 1: Each user runs the sends the query to the ADL, where the user
query consists of the chosen keywords and the query rank.
2. Step 2: Given users queries, the ADL runs the Matrix- Construct algo-
rithm (Alg. 1) to send a mask matrix to the Cloud.
3. Step 3: Based on the mask matrix, the cloud runs the File Filter algo-
rithm (Alg. 2) to filter out a certain percentage of matched files and
returns a union buffer to the ADL.
4. Step 4: The ADL runs the Result Divide algorithm to distribute files to
each user. The ADL first recovers all files that match user queries as
the File Recover algorithm.
Algorithm 5.1 Matrix construct

for i = 1 to d
do
Set l to be the highest query rank choosing the ith keyword in Dic
for j=1 to r
do
if l + j r then M[i,j]= 1
else M[i,j]= 0
Encrypt M[i,j] with the ADLs public key
Algorithm 5.2 File filter

for each file Fj stored in the cloud do
for i=1 to d
do
k=j mod r;
c j =Dic[i]Fj M[i,k];
|F |
e j =c j j
Multiply pair (c j, e j )many times to a compact buffer

5.6 Snap Shots
5.6.1 Home page
Figure 5.3: Home page
5.6.2 Organisation Login
Figure 5.4: Organisation login
5.6.3 Organisation Home
Figure 5.5: Organisation home
5.6.4 User Details
Figure 5.6: User details
5.6.5 File Upload
Figure 5.7: File upload
5.6.6 Cloud Login
Figure 5.8: Cloud login
5.6.7 Cloud Home
Figure 5.9: Cloud Home
5.6.8 Cloud Queries
Figure 5.10: Cloud queries

5.6.9 ADL Login
Figure 5.11: ADL login
5.6.10 ADL Home
Figure 5.12: ADL home
5.6.11 User Sign Up
Figure 5.13: User sign up
5.6.12 User Home
Figure 5.14: User home
5.6.13 User Login
Figure 5.15: User login
5.6.14 Ostrovosky Query
Figure 5.16: Ostrovosky query
5.6.15 Ostrovosky result
Figure 5.17: Ostrovosky result
5.6.16 EIRQ Query
Figure 5.18: EIRQ query

5.6.17 EIRQ Result
Figure 5.19: EIRQ Result

Chapter 6
Testing
6.1 Introduction to Testing
Testing is a process of executing the program with the intent of finding an

error, it is a set of activities that can be planned in advance and conducted
systematically. Inadequate testing or non-testing leads to errors. Nothing
is completed without the testing, as it is vital to the success of the system.
Testing of the developed system is done using various kinds of data. While
testing, errors are noted and correctness is made. The purpose of system
testing is to correct the error in the system. So in general testing demon-
strates that the system is working according to the specifications, and that it
meets the performance requirements.
System testing is done to check whether the system works accurately and
efficiently before live operation commences. A small error can conceivably
explode into much larger problem. Effective testing early in the process
translates directly into long term cost savings from a reduced number of
errors.
49
6.2 Approach
To test a code, an incremental approach is used. This is because the de-

velopment of the system follows bottom up approach. Every function is
individually developed and later all the functions are integrated. To test the
individual functions separately, the whitebox testing mechanism is used.
Every line of the code is checked for its proper functioning. Each line of the
control is tested properly and the error is eliminated.
6.3 Test Environment
The software was tested on the following platform. Operating System

Windows 7 (64 bit), Eclipse Kepler 4.3.2, and JDK/JRE 7
6.4 Unit testing
Unit testing is the process of testing individual components in the system.

This is a defect testing process so its goal is to expose faults in these com-
ponents. There are different types of component that may be tested at this
stage. Individual functions or methods are the simplest type of component
and our tests are a set of calls to these routines with different parameters.
All the independent paths were exercised to ensure that all the statements in
the module are executed at least once and all the error handling paths were
tested. At the end of this testing phase, each unit is found to be working
satisfactorily, as regard to the expected output from the module.

6.4.1 Unit test cases
Test case Input Outcome

User Account Creation New username password Generation of username, passwords for each client
User login Valid User name password Login into user account
File Upload File, keyword Upload Files to the cloud server along with correspo
File search Keyword, Rank Certain percentage of matched files are retrieved fro
Table 6.1: Unit testing
6.5 Integration Testing
In integration testing many tested modules are combined into sub-system

which is then tested. The goal of integration testing is to see if the mod-
ules can be integrated properly. The implementation of the system features
may be spread across a number of components. Testing a new feature may
therefore require several different components to be integrated. The testing
may reveal errors in the interactions between these individual components
and other parts of the system. Repairing errors may be difficult because it
affects the whole group of components that implement the system feature.
Furthermore, when a new component is integrated and tested, this can
change the pattern of the previous, already tested component interactions.
Errors may be revealed that were not exposed in the tests of the simpler
configuration.
The integration testing has two types mainly top down and bottom up
testing. These strategies reflect different approaches to system integration.
In top-down integration, the high level components of a system are inte-
grated and tested before their design implementation has been completed.
In bottom-up integration, low-level components are integrated and tested
before the higher-level components have been developed.

6.6 System Testing
The whole system has been tested for functionality through relevant test
cases. The test has been also done to check whether the data is properly
stored or not. All links have been tested for functionality. After implemen-
tation, testing has been done with some relevant test case to check whole
system functionality.

Conclusion and Future Scope
The proposed EIRQ schemes based on an ADL provides differential query

services while protecting user privacy. By using the project, a user can
retrieve different percentages of matched files by specifying queries of dif-
ferent ranks. By further reducing the communication cost incurred on the
cloud, the EIRQ schemes make the private searching technique more appli-
cable to a cost-efficient cloud environment.
However, in the EIRQ schemes, we simply determine the rank of each
file by the highest rank of queries it matches. For our future work, we will
try to design a flexible ranking mechanism for the EIRQ schemes.
53
Bibliography
[1] R. Ostrovsky and W. Skeith III, Private searching on streaming

data,in Proc. of ACM CRYPTO, 2005.
[2] P. Mell and T. Grance, The nist definition of cloud computing (draft),
NIST Special Publication, 2011.
[3] ] R. Curtmola, J. Garay, S. Kamara, and R. Ostrovsky, Searchable

symmetric encryption: improved definitions and efficient onstruc-
tions, in Proc. of ACM CCS, 2006.
[4] J. Bethencourt, D. Song, and B. Waters, New constructions and prac-

tical applications for private stream searching, in Proc. Of IEEE S&P,
2006.
[5] , New techniques for private stream searching, ACM Transac-

tions on Information and System Security, 2009.
[6] G. Danezis and C. Diaz, Improving the decoding efficiency of private

search, in IACR Eprint archive number 024, 2006.
[7] , Space-efficient private search with applications to rate less

codes, Financial Cryptography and Data Security, 2007.
[8] B. Chor, O. Goldreich, E. Kushilevitz, and M. Sudan, Private Infor-

mation Retrieval, Journal of ACM , 1995.
[9] C. Wang, N. Cao, J. Li, K. Ren, and W. Lou, Secure ranked keyword
search over encrypted cloud data, in Proc. of IEEE ICDCS, 2010.
54
[10] A. Boldyreva, N. Chenette, Y. Lee, and A. Oneill, Order-preserving

symmetric encryption, Advances in Cryptology-EUROCRYPT, 2009.
[11] N. Cao, C. Wang, M. Li, K. Ren, and W. Lou, Privacy-preserving

multi keyword ranked search over encrypted cloud data, in Proc. of
IEEE INFOCOM, 2011.
[12] W. Wong, D. Cheung, B. Kao, and N. Mamoulis, Secure knn compu-

tation on encrypted databases, in Proc. of ACM SIGMOD, 2009.
[13] R. Curtmola, J. Garay, S. Kamara, and R. Ostrovsky, Searchable sym-

metric encryption: improved definitions and efficient constructions, in
Proc. of ACM CCS, 2006.
[14] I. Damagard and M. Jurik, A generalization, a Simplification and

Some Applications of Paillers Probalistic Public Key System in pro-
ceedings of PKC, 2001.
[15] , Private searching on streaming data, Journal of Cryptology,

2007.

Main Report

Uploaded by

Copyright:

Available Formats

Main Report

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Main Report

Uploaded by

Copyright:

Available Formats

Acknowledgement

I wish to thank Dr. S. C. Pilli, Principal, KLEs College and Technology,

Cloud computing as an emerging technology that is expected to reshape

3 Software Requrirement Specification 12

1.1 Cloud clomputing Architecture . . . . . . . . . . . . . . . . 2

4.1 System architecture . . . . . . . . . . . . . . . . . . . . . . 22

5.1 Working of java . . . . . . . . . . . . . . . . . . . . . . . . 33

6.1 Unit testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

5.1 Matrix construct . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

1.1 Cloud computing

Cloud computing is the delivery of computing services over the Internet.

1.1.1 What is cloud?

1.1.2 What is cloud computing?

Cloud Computing is the use of hardware and software to deliver a service

Figure 1.1: Cloud clomputing Architecture

1.1.3 Classification of services provided by cloud

Platform as a Service (PaaS) involves offering a development platform

Software as a service (SaaS) includes a complete software offering

Dept. of Computer Science & Engg,KLE Dr.MSSCET, Belgaum 2

cloud vendor on pay-per-use basis. This is a well-established sector.

These are listed below:

Figure 1.2: Classification of services provided by cloud

Dept. of Computer Science & Engg,KLE Dr.MSSCET, Belgaum 3

1.1.4 Types of clouds

Public cloud: In Public cloud the computing infrastructure is hosted by

Private cloud: The computing infrastructure is dedicated to a particular

Hybrid cloud: Organizations may host critical applications on private

Community cloud: involves sharing of computing infrastructure in be-

Figure 1.3: Type of cloud

Dept. of Computer Science & Engg,KLE Dr.MSSCET, Belgaum 4

1.2 Problem Statement

Due to the overwhelming merits of cloud computing, such as scalability

1.3 Aim and objective

Dept. of Computer Science & Engg,KLE Dr.MSSCET, Belgaum 5

1.4 Overview of project

Figure 1.4: Application scenario

Hence to achieve solutions to above problems a proxy server called ag-

Dept. of Computer Science & Engg,KLE Dr.MSSCET, Belgaum 6

1.5 Issues and Challenges

Dept. of Computer Science & Engg,KLE Dr.MSSCET, Belgaum 7

Literature survey is the most important step in software development pro-

2.1 Previous Research Work

Our work is on protecting user privacy while searching data on untrusted

2.2 Existing System

Dept. of Computer Science & Engg,KLE Dr.MSSCET, Belgaum 9

2.3 Proposed System

The proposed scheme is Efficient Information retrieval for Ranked Query

Dept. of Computer Science & Engg,KLE Dr.MSSCET, Belgaum 10

Dept. of Computer Science & Engg,KLE Dr.MSSCET, Belgaum 11

Software Requrirement Specification

A Software Requirements Specification (SRS) is a complete description of

can reveal omissions, misunderstandings, and inconsistencies early in the

3.1 Overall Description

A Software Requirements Specification (SRS) is a complete description of

3.1.1 Product Perspective

The project work is basically a java based application intended to highlight

Dept. of Computer Science & Engg,KLE Dr.MSSCET, Belgaum 13

3.1.2 Product Functions

The Efficient Information Retrieval in Cloud Environment with Privacy Pre-