Main Report
Main Report
Main Report
1 Introduction 1
1.1 Cloud computing . . . . . . . . . . . . . . . . . . . . . . . 1
1.1.1 What is cloud? . . . . . . . . . . . . . . . . . . . . 2
1.1.2 What is cloud computing? . . . . . . . . . . . . . . 2
1.1.3 Classification of services provided by cloud . . . . . 2
1.1.4 Types of clouds . . . . . . . . . . . . . . . . . . . 4
1.2 Problem Statement . . . . . . . . . . . . . . . . . . . . . . 5
1.3 Aim and objective . . . . . . . . . . . . . . . . . . . . . . . 5
1.4 Overview of project . . . . . . . . . . . . . . . . . . . . . . 6
1.5 Issues and Challenges . . . . . . . . . . . . . . . . . . . . 7
2 Literature Survey 8
2.1 Previous Research Work . . . . . . . . . . . . . . . . . . . 8
2.2 Existing System . . . . . . . . . . . . . . . . . . . . . . . . 9
2.3 Proposed System . . . . . . . . . . . . . . . . . . . . . . . 10
i
3.1.5 Assumptions and Dependencies . . . . . . . . . . . 15
3.2 Specific Requirements . . . . . . . . . . . . . . . . . . . . 16
3.2.1 Functional Requirement . . . . . . . . . . . . . . . 16
3.2.1.1 Purpose . . . . . . . . . . . . . . . . . . 16
3.2.1.2 Process . . . . . . . . . . . . . . . . . . . 17
3.2.2 Non Functional Requirement . . . . . . . . . . . . . 17
3.2.3 Software Requirements . . . . . . . . . . . . . . . 18
3.2.4 Hardware Requirements . . . . . . . . . . . . . . . 18
4 System Design 19
4.1 Design Considerations . . . . . . . . . . . . . . . . . . . . 20
4.2 Development Method . . . . . . . . . . . . . . . . . . . . 20
4.3 System Architecture . . . . . . . . . . . . . . . . . . . . . . 22
4.4 Flow charts . . . . . . . . . . . . . . . . . . . . . . . . . . 22
4.4.1 Flowchart for User . . . . . . . . . . . . . . . . . . 23
4.4.2 Flowchart for ADL . . . . . . . . . . . . . . . . . 23
4.4.3 Flowchart for Admin . . . . . . . . . . . . . . . . . 23
4.4.4 Flowchart for Cloud . . . . . . . . . . . . . . . . . 24
4.5 Data Flow Diagram . . . . . . . . . . . . . . . . . . . . . . 24
4.6 Use case Diagram . . . . . . . . . . . . . . . . . . . . . . . 26
4.6.1 Use case Diagram for Users . . . . . . . . . . . . . 27
4.6.2 Use case Diagram for ADL . . . . . . . . . . . . . 27
4.6.3 Use case Diagram for Administrator . . . . . . . . . 28
4.6.4 Use case Diagram for Cloud . . . . . . . . . . . . . 28
4.7 Sequence Diagram . . . . . . . . . . . . . . . . . . . . . . 29
5 Implementation 31
5.1 Implementation Requirements . . . . . . . . . . . . . . . . 31
5.2 Selection of the Platform . . . . . . . . . . . . . . . . . . . 32
5.3 Selection of Language . . . . . . . . . . . . . . . . . . . . 32
ii
5.3.1 Java Technology . . . . . . . . . . . . . . . . . . . 32
5.3.1.1 How Java Technology Works . . . . . . . 32
5.3.1.2 How Java Technology Changes Our Life . 34
5.3.2 JavaScript . . . . . . . . . . . . . . . . . . . . . . . 35
5.3.2.1 Few things we can do with JavaScript . . 35
5.3.2.2 Difference between JavaScript and Java . . 36
5.3.3 JDK . . . . . . . . . . . . . . . . . . . . . . . . . 36
5.3.4 MySQL . . . . . . . . . . . . . . . . . . . . . . . 36
5.3.4.1 Distinct Features of MySQL . . . . . . . 37
5.3.4.2 FEATURES OF MYSQL . . . . . . . . . 38
5.3.5 HMTL5 . . . . . . . . . . . . . . . . . . . . . . . 38
5.3.5.1 Features . . . . . . . . . . . . . . . . . . 40
5.4 Overview of Module Description . . . . . . . . . . . . . . 42
5.4.1 Efficient Information Retrieval for Ranked Query: . 42
5.4.2 Aggregation and Distribution Layer: . . . . . . . . . 42
5.4.3 Ranked Queries: . . . . . . . . . . . . . . . . . . . 43
5.4.4 User Privacy . . . . . . . . . . . . . . . . . . . . . 44
5.5 Working of modules . . . . . . . . . . . . . . . . . . . . . 44
5.6 Snap Shots . . . . . . . . . . . . . . . . . . . . . . . . . . 46
5.6.1 Home page . . . . . . . . . . . . . . . . . . . . . . 46
5.6.2 Organisation Login . . . . . . . . . . . . . . . . . . 46
5.6.3 Organisation Home . . . . . . . . . . . . . . . . . . 46
5.6.4 User Details . . . . . . . . . . . . . . . . . . . . . . 46
5.6.5 File Upload . . . . . . . . . . . . . . . . . . . . . . 46
5.6.6 Cloud Login . . . . . . . . . . . . . . . . . . . . . 46
5.6.7 Cloud Home . . . . . . . . . . . . . . . . . . . . . 46
5.6.8 Cloud Queries . . . . . . . . . . . . . . . . . . . . 46
5.6.9 ADL Login . . . . . . . . . . . . . . . . . . . . . . 47
5.6.10 ADL Home . . . . . . . . . . . . . . . . . . . . . . 47
iii
5.6.11 User Sign Up . . . . . . . . . . . . . . . . . . . . . 47
5.6.12 User Home . . . . . . . . . . . . . . . . . . . . . . 47
5.6.13 User Login . . . . . . . . . . . . . . . . . . . . . . 47
5.6.14 Ostrovosky Query . . . . . . . . . . . . . . . . . . 47
5.6.15 Ostrovosky result . . . . . . . . . . . . . . . . . . . 47
5.6.16 EIRQ Query . . . . . . . . . . . . . . . . . . . . . 47
5.6.17 EIRQ Result . . . . . . . . . . . . . . . . . . . . . 48
6 Testing 49
6.1 Introduction to Testing . . . . . . . . . . . . . . . . . . . . 49
6.2 Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
6.3 Test Environment . . . . . . . . . . . . . . . . . . . . . . . 50
6.4 Unit testing . . . . . . . . . . . . . . . . . . . . . . . . . . 50
6.4.1 Unit test cases . . . . . . . . . . . . . . . . . . . . 51
6.5 Integration Testing . . . . . . . . . . . . . . . . . . . . . . 51
6.6 System Testing . . . . . . . . . . . . . . . . . . . . . . . . 52
iv
List of Figures
v
5.8 Cloud login . . . . . . . . . . . . . . . . . . . . . . . . . . 46
5.9 Cloud Home . . . . . . . . . . . . . . . . . . . . . . . . . . 46
5.10 Cloud queries . . . . . . . . . . . . . . . . . . . . . . . . . 46
5.11 ADL login . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
5.12 ADL home . . . . . . . . . . . . . . . . . . . . . . . . . . 47
5.13 User sign up . . . . . . . . . . . . . . . . . . . . . . . . . . 47
5.14 User home . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
5.15 User login . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
5.16 Ostrovosky query . . . . . . . . . . . . . . . . . . . . . . . 47
5.17 Ostrovosky result . . . . . . . . . . . . . . . . . . . . . . . 47
5.18 EIRQ query . . . . . . . . . . . . . . . . . . . . . . . . . . 47
5.19 EIRQ Result . . . . . . . . . . . . . . . . . . . . . . . . . . 48
vi
List of Tables
vii
List of Algorithms
viii
Chapter 1
Introduction
1
Efficient Information Retrieval in Cloud Environment with Privacy Preserving
The "cloud" is a set of different types of hardware and software that work
collectively to deliver many aspects of computing to the end-user as an on-
line service.
Based upon the services offered, clouds are classified in the following ways:
Infrastructure as a service (IaaS) involves offering hardware related
services using the principles of cloud computing. These could include
some kind of storage services (database or disk storage) or virtual
servers. Leading vendors that provide Infrastructure as a service are
Amazon EC2, Amazon S3, Rackspace Cloud Servers and Flexiscale.
Storage-as-a-service
Database-as-a-service
Information-as-a-service
Process-as-a-service
Application-as-a-service
Platform-as-a-service
Integration-as-a-service
Security-as-a-service
Management/Governance-as-a-service
Testing-as-a-service
Infrastructure-as-a-service
Private clouds are of two types: On-premise private clouds and exter-
nally hosted private clouds. Externally hosted private clouds are also
exclusively used by one organization, but are hosted by a third party
specializing in cloud infrastructure. Externally hosted private clouds
are cheaper than On-premise private clouds.
The aim of the project is to make private search applicable in a cloud envi-
ronment that reduces the computational cost and communication costs. The
project also focuses on providing user privacy, user privacy can be divided
into search privacy and access privacy, where the cloud neither learns what
the user is searching for nor which files are returned to a user.
For instance, let us consider the application scenario as shown in Fig. 1.3.
In the traditional method an organization subscribes the cloud services and
will authorize its staff to save the files in the cloud. Each file while up-
loading will be described by a set of keywords, and the authorized users
can retrieve the interested files by querying the cloud by providing relevant
keywords. Since a cloud is operated by a third party, there have been some
concerns over the possible privacy leaks that may occur. Such concerns
have led researchers to propose various techniques to protect user privacy.
Alternatively, if more than one queries can be combined together, the over-
head by reducing the number of queries that the server has to process can be
saved.
files he wants to be returned. If the users are allowed to retrieve the matched
files according to their demand the overall bandwidth consumed while ac-
cessing the files from the cloud can be reduced.
When the users want to search for some files, they will send a query to the
cloud with certain keywords. The cloud will evaluate the query and return
the necessary files to the users. During this process, the cloud will know
what files the user is interested in from observing the query and the type of
the files returned to that user. Preventing a leak of this type of information
to the cloud is difficult since the cloud must have access to the information
to efficiently return the appropriate files to the users.
Literature Survey
8
Efficient Information Retrieval in Cloud Environment with Privacy Preserving
can be protected in private searching and PIR, but only search privacy can
be protected in searchable encryption.
Private searching was first proposed by [1, 15], where data is stored in
the clear form, and the query is encrypted with the Paillier cryptosystem[14]
[14] that exhibits the homomorphic properties. Ranked searchable encryp-
tion enables users to retrieve the most matched files from the cloud in the
case that both the query and data are in the encrypted form. The work by
[9], which only supports single-keyword searches, encrypts files and queries
with Order Preserving Symmetric Encryption (OPSE) [10] and utilizes key-
word frequency to rank results. Their following work [11], which supports
multiple-keyword searches, uses the secure KNN technique[12] to rank re-
sults based on inner products. The main limitation of these approaches is
that user access privacy [13] will not be preserved.
A key privacy search solution was proposed by Ostrovsky et al. [1, 15],
which allows a user to retrieve files of interest from an untrusted server
without leaking any information. It provide the same privacy level as down-
loading the entire database from the cloud with significantly less commu-
nication costs. The cloud cannot know which files are really interested by
a user by asking the cloud to return the entire database. However, the Os-
trovsky scheme has a high computational cost, since it requires the cloud
to process the query on every file in a collection. Otherwise, the cloud will
learn that certain files, without processing, are of no interest to the user.
It will quickly become a performance bottleneck when the cloud needs to
process thousands of queries over a collection of hundreds of thousands of
files.
files that contain keywords A, C. Suppose that the cloud holds 500 files
described by keywords A, B and 500 files described by keywords A, C.
Without combination, the cloud will have to return 2000 files, and without
ranking, the cloud will have to return 1000 files, but only 110 of which are
actually needed.
12
Efficient Information Retrieval in Cloud Environment with Privacy Preserving
Uploading the data file into server node along with the keyword (phys-
ical storage or cloud).
Specify the rank to fetch the required percentage of matched files. The
searching activity is commonly performed by both the admin and the
user.
3.1.4 Pre-requisites
The user should have installed one of the operating system viz. Windows,
Linux or Mac on one of the systems with hardware specification as men-
tioned in the subsequent section.
User should possess minimum knowledge about how to host the appli-
cation in the web browser.
Should be aware of the type of input that has to be given to the appli-
cation.
The following assumptions are being made in the development of our project:
The data file must be uploaded on the server side by the administrator
only along with the appropriate keyword. The project work also assumes
the server never goes down during the uploading process maintaining the
same availability. The following are the dependencies of our project: This
project is code in Java and the application requires proper Eclipse frame-
work/environment installation. So anyone who wishes to work on fur-
ther development of this project should know this programming language.
Proper configuration of the network has to be done before executing the
application.
This section of the SRS should contain all the software requirements to a
level of detail sufficient to enable designers to design a system to satisfy
those requirements. It also helps tester to design their test case to verify
whether system satisfies the specified requirements.
In this application, forms for login page, search page etc. are created
for users and administrator to give a layer of security.
Buttons and text fields are provided in order to take the input from the
user.
Error messages are provided for invalid inputs and separate alert mes-
sages for the display of successful outcomes like creation or deletion
of users.
For the normal user, searching the files and downloading them from
the cloud are allowed by giving the appropriate key word.
To upload the data file, the scroll box must be provided in order to
select the required file from the client machine.
The ADL takes the request from multiple users and merges them as a
single query and send it to cloud.
The cloud after filtering the file will return it to the ADL who will
distribute among the users as their ranks. Eclipse provides classes and
3.2.1.2 Process
This service is portable and can run on any system having a minimum hard-
ware configuration as mentioned:
Scripts : JavaScript.
Database : MySQL
Hard Disk - 20 GB
System Design
19
Efficient Information Retrieval in Cloud Environment with Privacy Preserving
The project work is using the waterfall lifecycle model for the development
of the project. The waterfall model is an activity centered lifecycle model.
The approach of the waterfall model is in a step-by-step way where all the
requirements of one activity are completed before the design of the activity
is started. The entire project design is broken down into several small tasks
in order of precedence and these tasks are designed one by one making sure
they work perfectly. Once one of these small tasks is completed another
task, which is dependent on the completed task, can be started. Each step
after being completed is verified to ensure the task is working, error-free and
meeting all the requirements. The project work chose this lifecycle model
for the project primarily for two reasons. First reason being simplicity, by
using the waterfall model the entire project can be broken down into smaller
activities which can be converted relatively easily into code and once the en-
tire thing is combined the code for the project can be derived. The second
System & Software Design: In this stage the system specifications are
translated into a software representation. The software engineer at this
stage is concerned with data structure, software architecture and inter-
face representations. In this phase System and Software Design was
carried out. The system architecture was decided and Class Diagrams
and Sequence Diagrams were drawn.
Implementation: In this stage the designs are translated into the soft-
ware domain. In this phase the actual coding of the project was done.
Large systems are always decomposed into sub-systems that provide some
related set of services. The initial design process of identifying these sub-
systems and establishing a framework for sub-system control and communi-
cation is called architecture design and the output of this design process is a
description of the software architecture. The architectural design process is
concerned with establishing a basic structural framework for a system. It in-
volves identifying the major components of the system and communications
between these components. In the following sub-sections we explore into
the design aspects and the sub systems involved in this software package.
find flaws, bottlenecks, and other less-obvious features within it. There are
many different types of flowcharts, and each type has its own repertoire of
boxes and notational conventions.
The two most common types of boxes in a flowchart are:
a processing step, usually called activity, and denoted as a rectangular
box
a decision, usually denoted as a diamond.
A flowchart is described as "cross-functional" when the page is divided
into different swim-lanes describing the control of different organizational
units. A symbol appearing in a particular "lane" is within the control of that
organizational unit. This technique allows the author to locate the respon-
sibility for performing an action or making a decision correctly, showing
the responsibility of each organizational unit for different parts of a single
process. Flowcharts depict certain aspects of processes and they are usually
complemented by other types of diagram. For instance, Kaoru Ishikawa de-
fined the flowchart as one of the seven basic tools of quality control, next
to the histogram, Pareto chart, check sheet, control chart, cause-and-effect
diagram, and the scatter diagram.
2. DFDs can provide a high level system overview, complete with bound-
aries and connections to other systems.
DFDs help system designers and others during initial analysis stages visual-
ize a current system or one that may be necessary to meet new requirements.
Systems analysts prefer working with DFDs, particularly when they require
a clear understanding of the boundary between existing systems and postu-
lated systems. DFDs represent the following:
that must be present in order for the system to do its job, and shows the flow
of data between the various parts of the system. Data flow diagrams were
proposed by Larry Constantine, the original developer of structured design,
based on Martin and Estrins "data flow graph" model of computation. Data
flow diagrams are one of the three essential perspectives of the structured-
systems analysis and design method SSADM. The sponsor of a project and
the end users will need to be briefed and consulted throughout all stages of
a systems evolution. With a data flow diagram, users are able to visualize
how the system will operate, what the system will accomplish, and how the
system will be implemented. The old systems dataflow diagrams can be
drawn up and compared with the new systems data flow diagrams to draw
comparisons to implement a more efficient system. Data flow diagrams can
be used to provide the end user with a physical idea of where the data they
input ultimately has an effect upon the structure of the whole system from
order to dispatch to report. How any system is developed can be determined
through a data flow diagram model. In the course of developing a set of
levelled data flow diagrams the analyst/designers is forced to address how
the system may be decomposed into component sub-systems, and to iden-
tify the transaction data in the data model. Data flow diagrams can be used
in both Analysis and Design phase of the SDLC
Roles of the actors in the system can be depicted. Use Case diagrams are
formally included in two modelling languages defined by the OMG: the
Unified Modelling Language (UML) and the Systems Modelling Language
(SysML). Diagram building blocks:
Actor: ADL
Precondition: ADL should have the keywords.
Description: The ADL will wait for the keywords for certain period of
time and mergers all the queries from different users into a single query and
send it to cloud. After receiving the merged result from cloud the ADL will
distribute the result among the users according their keywords and rank.
Description: The cloud monitors the files uploaded and the user queries.
Based on the user query the cloud fetches the files from the cloud and send
it to ADL.
UML sequence diagrams are used to represent or model the flow of mes-
sages, events and actions between the objects or components of a system.
Time is represented in the vertical direction showing the sequence of inter-
actions of the header elements, which are displayed horizontally at the top
of the diagram. Sequence Diagrams are used primarily to design, document
and validate the architecture, interfaces and logic of the system by describ-
ing the sequence of actions that need to be performed to complete a task
or scenario. UML sequence diagrams are useful design tools because they
provide a dynamic view of the system behaviour, which can be difficult to
extract from static diagrams or specifications. A sequence diagram is an
interaction diagram that shows how processes operate with one another and
in what order. It is a construct of a Message Sequence Chart. A sequence
diagram shows object interactions arranged in time sequence. It depicts the
objects and classes involved in the scenario and the sequence of messages
exchanged between the objects needed to carry out the functionality of the
scenario. Sequence diagrams are typically associated with use case real-
izations in the Logical View of the system under development. Sequence
diagrams are sometimes called event diagrams, event scenarios A sequence
diagram shows, as parallel vertical lines (lifelines), different processes or
objects that live simultaneously, and, as horizontal arrows, the messages ex-
changed between them, in the order in which they occur. This allows the
specification of simple runtime scenarios in a graphical manner. If the life-
line is that of an object, it demonstrates a role. Note that leaving the instance
Implementation
31
Efficient Information Retrieval in Cloud Environment with Privacy Preserving
For the implementation of project Java was chosen as the existing code was
in Java. Java offers numerous advantages as briefed in the section below.
Javas growth over the last 10 years has been nothing short of phenomena.
Java technology is a high-level programming and a platform independent
language Java is a well-known technology which allows for software de-
signed and written only once for an "virtual machine" to run on a different
computers, supports various Operating System like Windows PCs, Macin-
toshes, and Unix computers. All source code is written in text files (Notepad
Editor) save with the .java extension in the Java programming language.
The source files are compiled into .class files by the java compiler. A .class
file contains byte codes, the machine language of the Java Virtual Machine
(JVM). The java launcher tool runs application with an instance of the Java
Virtual Machine.
The diagram below depicts how the Java technology works. Source code
is compiled into byte code using the compiler, which is stored on the disk.
The byte code is input to the Java Runtime Environment (JRE).Byte code
verifier verifies the byte code and then the byte code is given to the class
loader. Just In Time (JIT) comes into picture for repeated code optimization
and output of the JIT is native code for the native operating system.
Write Once and Used in any Java Platform: Any Source code of Pro-
gram are written in the Java programming language, that is compiled
into machine-independent byte codes and run consistently on any plat-
form of java.
5.3.2 JavaScript
Animate images or rotate images that change when we move the mouse
over them.
Detect the browser in use and display different content for different
browsers.
JavaScript and Java are entirely different languages. A few of the most
glaring differences are Java applets are generally displayed in a box within
the web document; JavaScript can affect any part of the Web document
itself. While JavaScript is best suited to simple applications and adding
interactive features to Web pages; Java can be used for incredibly complex
applications.
5.3.3 JDK
5.3.4 MySQL
MySQL is the worlds most widely use open source relational database man-
agement system (RDBMS) that runs as a server providing multi-user access
to a number of databases. The SQL phrase stands for Structured Query
Language. MySQL is a popular choice of database for use in web appli-
cations, and is a central component of the widely used LAMP open source
web application software stack .LAMP is an acronym for "Linux, Apache,
MySQL, Perl/PHP/Python." Free-software-open source projects that require
a full-featured database management system often use MySQL. A database
is a structure that comes in two flavours: a flat database and a relational
database. A relational database is much more oriented to the human mind
and is often preferred over the gabble-de-gook flat database that are just
stored on hard drives like a text file. MySQL is a relational database.
Databases are most useful when it comes to storing information that fits into
logical categories. For example, say that you wanted to store information
of all the employees in a company. With a database you can group different
parts of your business into separate tables to help store your information log-
ically. Example tables might be: Employees, Supervisors, and Customers.
Each table would then contain columns specific to these three areas. To help
store information related to each employee, the Employees table might have
the following columns: Hire, Date, Position, Age, and Salary.
Client/server architecture.
Data independence.
Parallel processing support for speed up data entry and online transac-
tion processing used for applications.
5.3.5 HMTL5
Model (DOM) are no longer afterthoughts, but are fundamental parts of the
HTML5 specification. HTML5 also defines in some detail the required pro-
cessing for invalid documents so that syntax errors will be treated uniformly
by all conforming browsers and other user agents.
5.3.5.1 Features
Markup
Document editing
Drag-and-drop
Cross-document messaging
Micro data
Not all of the above technologies are included in the W3C HTML5 spec-
ification, though they are in the WHATWG HTML specification. Some
related technologies, which are not part of either the W3C HTML5 or the
WHATWG HTML specification, are as follows. The W3C publishes speci-
fications for these separately:
Geolocation
Web Audio API, a high-level JavaScript API for processing and syn-
thesizing audio in web applications.
Efficient Information retrieval for Ranked Query (EIRQ) allows each user
can choose the rank of his query to determine the percentage of matched
files to be returned. The basic idea of EIRQ is to construct a privacy pre-
serving mask matrix that allows the cloud to filter out a certain percentage of
matched files before returning to the ADL. This is not a trivial work, since
the cloud needs to correctly filter out files according to the rank of queries
without knowing anything about user privacy.
collection and returns a buffer that contains all of matched files to the ADL,
which will distribute the search results to each user. To aggregate sufficient
queries, the organization may require the ADL to wait for a period of time
before running the schemes, which may incur a certain querying delay. The
ADL is assumed to be trusted by all of the users since it is deployed within
the organization itself, and the communication channels are assumed to be
secured under security protocols like SSL. Each user individually sends the
query to the ADL, which will distribute appropriate files to each user. As
long as the ADL is trusted and correctly executes our schemes, the user
cannot know anything about other users interests
User privacy can be divided into search privacy and access privacy, where
the cloud neither learns what the user is searching for nor which files are
returned to a user.
Search privacy: In EIRQ, the combined query (the mask matrix) from
the ADL to the cloud is encrypted with the ADLs public key. There-
fore, the cloud cannot deduce what each user is searching for from the
encrypted query.
Rank privacy: In EIRQ, the mask matrix from the ADL to the cloud
is a d-row and r-column matrix, where r is the information that is the
information that we leak more than [1]. Given r, the cloud only knows
that all users are classified into r ranks without knowing how many
users are in each rank, nor which users are in which ranks. Therefore,
user rank privacy is protected.
The system overview provides the actual view of the application in the real
time environment. Below figure shows the system overview. As depicted
in the figure the user generates a query and send it to ADL along with the
required rank that will fetch the required percentage of matched files. The
ADL receives request from multiple users and mergers all the queries into a
single query and send it to cloud. The cloud will filter the required amount
of files from the matched files and send the result back to ADL. The ADL
will distribute the relevant files to the users.
1. Step 1: Each user runs the sends the query to the ADL, where the user
query consists of the chosen keywords and the query rank.
2. Step 2: Given users queries, the ADL runs the Matrix- Construct algo-
rithm (Alg. 1) to send a mask matrix to the Cloud.
3. Step 3: Based on the mask matrix, the cloud runs the File Filter algo-
rithm (Alg. 2) to filter out a certain percentage of matched files and
returns a union buffer to the ADL.
4. Step 4: The ADL runs the Result Divide algorithm to distribute files to
each user. The ADL first recovers all files that match user queries as
the File Recover algorithm.
Testing
49
Efficient Information Retrieval in Cloud Environment with Privacy Preserving
6.2 Approach
The whole system has been tested for functionality through relevant test
cases. The test has been also done to check whether the data is properly
stored or not. All links have been tested for functionality. After implemen-
tation, testing has been done with some relevant test case to check whole
system functionality.
53
Bibliography
[2] P. Mell and T. Grance, The nist definition of cloud computing (draft),
NIST Special Publication, 2011.
[9] C. Wang, N. Cao, J. Li, K. Ren, and W. Lou, Secure ranked keyword
search over encrypted cloud data, in Proc. of IEEE ICDCS, 2010.
54
Efficient Information Retrieval in Cloud Environment with Privacy Preserving