Finalreport

Distributed
Data Sharing Platform for Private Enterprise

CHAPTER 1
INTRODUCTION
Companies of the same industry sector are often connected into a corporate network for
collaboration purposes. Each company maintains its own site and selectively shares a
portion of its business data with the others. Examples of such corporate networks include
supply chain networks where organizations such as suppliers, manufacturers, and retailers
collaborate with each other to achieve their very own business goals including planning
production-line, making acquisition strategies and choosing marketing solutions.
From a technical perspective, the key for the success of a corporate network is choosing
the right data sharing platform, a system which enables the shared data (stored and
maintained by different companies) network-wide visible and supports efficient analytical
queries over those data. Traditionally, data sharing is achieved by building a centralized
data warehouse, which periodically extracts data from the internal production systems
(e.g., ERP) of each company for subsequent querying. Unfortunately, such a warehousing
solution has some deficiencies in real deployment.
First, the corporate network needs to scale up to support thousands of participants, while
the installation of a large-scale centralized data warehouse system entails nontrivial costs
including huge hardware/software investments (total cost of ownership) and high
maintenance cost (total cost of operations). In the real world, most companies are not
keen to invest heavily on additional information systems until they can clearly see the
potential return on investment (ROI). Second, companies want to fully customize the
access control policy to determine which business partners can see which part of their
shared data. Unfortunately, most of the data warehouse solutions fail to offer such
flexibilities. Finally, to maximize the revenues, companies often dynamically adjust their
business process and may change their business partners. Therefore, the participants may
join and leave the corporate networks at will. The data warehouse solution has not been
designed to handle such dynamicity.
To address the aforementioned problems, we present a distributed data sharing platform

designed for corporate network applications. By integrating database, and peer-to-peer
Department of ISE, BIT Bangalore 1

Distributed Data Sharing Platform for Private Enterprise

(P2P) technologies, the system achieves its query processing efficiency and is a
promising approach for corporate network applications, with the following distinguished
features.
The system extends the role-based access control for the inherent distributed environment
of corporate networks. Through a web console interface, companies can easily configure
their access control policies and prevent undesired business partners to access their shared
data.
System employs P2P technology to retrieve data between business partners. Instances are
organized as a structured P2P overlay network named BATON. The data are indexed by
the table name, column name and data range for efficient retrieval.
System employs a hybrid design for achieving high performance query processing. The
major workload of a corporate network is simple, low-overhead queries. Such queries
typically only involve querying a very small number of business partners and can be
processed in short time. The system is mainly optimized for these queries. In summary,
the design of the system provides economical, flexible and scalable solutions for
corporate network applications. We demonstrate the efficiency of our system by
benchmarking our system against other large scale data processing platforms, over a set
of queries designed for data sharing applications. The results show that for simple, low-
overhead queries, the performance is significantly better than other large scale data
processing platforms. We describe the design of systems core components, which
including the bootstrap peer and the normal peer.
It can effectively help the companies to reduce their operational costs and increase the
revenues. However, the inter-company data sharing and processing poses unique
challenges to such a data management system including scalability, performance,
throughput, and security. In this paper, we present BestPeer++, a system which delivers
elastic data sharing services for corporate network applications in the cloud based on
BestPeer—a peer-to-peer (P2P) based data management platform. By integrating cloud
computing, database, and P2P technologies into one system, BestPeer++provides an
economical, flexible and scalable platform for corporate network applications and


delivers data sharing services to participants based on the widely accepted pay-as-you-go
business model. We evaluate BestPeer++ on Amazon EC2 Cloud platform.The
benchmarking results show that BestPeer++ outperforms HadoopDB, a recently proposed
large-scale data processing system, in performance when both systems are employed to
handle typical corporate network workloads. The benchmarking results also demonstrate
that BestPeer++ achieves near linear scalability for throughput with respect to the number
of peer nodes.
1.1 What is Data Mining?
Structure of Data Mining

Generally, data mining (sometimes called data or knowledge discovery) is the process of
analyzing data from different perspectives and summarizing it into useful information -
information that can be used to increase revenue, cuts costs, or both. Data mining
software is one of a number of analytical tools for analyzing data. It allows users to
analyze data from many different dimensions or angles, categorize it, and summarize the
relationships identified. Technically, data mining is the process of finding correlations or
patterns among dozens of fields in large relational databases.
1.2 How Data Mining Works?

While large-scale information technology has been evolving separate transaction and
analytical systems, data mining provides the link between the two. Data mining software


analyzes relationships and patterns in stored transaction data based on open-ended user
queries. Several types of analytical software are available: statistical, machine learning,
and neural networks. Generally, any of four types of relationships are sought:
• Classes: Stored data is used to locate data in predetermined groups. For example,
a restaurant chain could mine customer purchase data to determine when
customers visit and what they typically order. This information could be used to
increase traffic by having daily specials.
• Clusters: Data items are grouped according to logical relationships or consumer
preferences. For example, data can be mined to identify market segments or
consumer affinities.
• Associations: Data can be mined to identify associations. The beer-diaper

example is an example of associative mining.
• Sequential patterns: Data is mined to anticipate behavior patterns and trends. For
example, an outdoor equipment retailer could predict the likelihood of a backpack
being purchased based on a consumer's purchase of sleeping bags and hiking
shoes.
1.2.1 Data mining consists of five major elements:
• Extract, transform, and load transaction data onto the data warehouse system.
• Store and manage the data in a multidimensional database system.
• Provide data access to business analysts and information technology
professionals.
• Analyze the data by application software.
• Present the data in a useful format, such as a graph or table.
1.2.2 Levels of analysis
• Artificial neural networks: Non-linear predictive models that learn through

training and resemble biological neural networks in structure.


• Genetic algorithms: Optimization techniques that use process such as genetic
combination, mutation, and natural selection in a design based on the concepts of
natural evolution.
• Decision trees: Tree-shaped structures that represent sets of decisions. These
decisions generate rules for the classification of a dataset. Specific decision tree
methods include Classification and Regression Trees (CART) and Chi Square
Automatic Interaction Detection (CHAID). CART and CHAID are decision tree
techniques used for classification of a dataset. They provide a set of rules that you
can apply to a new (unclassified) dataset to predict which records will have a
given outcome. CART segments a dataset by creating 2-way splits while CHAID
segments using chi square tests to create multi-way splits. CART typically
requires less data preparation than CHAID.
• Nearest neighbor method: A technique that classifies each record in a dataset
based on a combination of the classes of the k record(s) most similar to it in a
historical dataset (where k=1). Sometimes called the k-nearest neighbor technique.
• Rule induction: The extraction of useful if-then rules from data based on
statistical significance.
• Data visualization: The visual interpretation of complex relationships in
multidimensional data. Graphics tools are used to illustrate data relationships.
1.2.3 Characteristics of Data Mining:
• Large quantities of data: The volume of data is so great it has to be analyzed by

automated techniques e.g. satellite information, credit card transactions etc.
• Noisy, incomplete data: Imprecise data is the characteristic of all data collection.
• Complex data structure: conventional statistical analysis not possible


CHAPTER 2
LITERATURE SURVEY
Many peer-to-peer systems are available in the literature which present various
methodologies for sharing of data and processing of queries.
BestPeer is one such data management platform implemented in recent technologies that
implemented the state of the art database techniques into P2P systems.
Query Integration is of utmost importance in terms of performance for organisations one
of the works suggest various techniques for the efficient and adaptive integration of
queries.
Another work suggests the efficient storage of data for fast retrieval using the Baton Tree
structure which scales up well even for complex queries.
2.1 A Comparative Analysis of Methodologies for Database

Schema Integration
C. Batini, M. Lenzerini, and S. Navathe use one of the fundamental principles of the
database approach is that a database allows a no redundant, unified representation of all
data managed in an organization. This is achieved only when methodologies are available
to support integration across organizational and application boundaries. Methodologies
for database design usually perform the design activity by separately producing several
schemas, representing parts of the application, which are subsequently merged. Database
schema integration is the activity of integrating the schemas of existing or proposed
databases into a global, unified schema. The aim of the paper is to provide first a unifying
framework for the problem of schema integration, then a comparative review of the work
done thus far in this area. Such a framework, with the associated analysis of the existing
approaches, provides a basis for identifying strengths and weaknesses of individual
methodologies, as well as general guidelines for future improvements and extensions.
2.2 BATON: A Balanced Tree Structure for Peer-to-Peer

Networks


H.V. Jagadish, B.C. Ooi, and Q.H. Vu propose a balanced tree structure overlay on a
peer-to-peer network capable of supporting both exact queries and range queries
efficiently. In spite of the tree structure causing distinctions to be made between nodes at
different levels in the tree, we show that the load at each node is approximately equal. In
spite of the tree structure providing precisely one path between any pair of nodes, we
show that sideways routing tables maintained at each node provide sufficient fault
tolerance to permit efficient repair. Specifically, in a network with N nodes, we guarantee
that both exact queries and range queries can be answered in O(log N) steps and also that
update operations (to both data and network) have an amortized cost of O(log N). An
experimental assessment validates the practicality of our proposal.
2.3 Data Sharing in the Hyperion Peer Database System

P. Rodrıguez-Gianolli and M. Garzetti presents a demo Hyperion, a prototype system
that supports data sharing for a network of independent Peer Relational Database
Management Systems (PDBMSs). The nodes of such a network are assumed to be
autonomous PDBMSs that form acquaintances at run-time, and manage mapping tables to
define value correspondences among different databases. They also use distributed Event-
Condition-Action (ECA) rules to enable and coordinate data sharing. Peers perform local
querying and update processing, and also propagate queries and updates to their
acquainted peers. The demo illustrates the following key functionalities of Hyperion: (1)
the use of (data level) mapping tables to infer new metadata as peers dynamically join the
network, (2) the ability to answer queries using data in acquaintances, and (3) the ability
to coordinate peers through update propagation.
2.4 Adaptive Multi-Join Query Processing in PDBMS

S. Wu, Q.H. Vu, J. Li, and K.-L. Tan in their paper explains that, traditionally,
distributed databases assume that the (small) set of nodes participating in a query is
known apriori, the data is well placed, and the statistics are readily available. However,
these assumptions are no longer valid in a peer-based database management system
(PDBMS). As such, it is a challenge to process and optimize queries in a PDBMS. In this
paper, we present our distributed solution to this problem for multi-way join queries. Our


approach first processes a multi-way join query based on an initial query evaluation plan
(generated using statistical data that may be obsolete or inaccurate); as the query is being
processed, statistics obtained on-the-fly are used to (continuously) refine the current plan
dynamically into a more effective one. We have conducted an extensive performance
study which shows that our adaptive query processing strategy can reduce the network
traffic significantly.


CHAPTER 3
PROPOSED METHODOLOGY
3.1 PROBLEM DEFINITION

To develop an efficient system that provides economical, flexible and scalable solutions
for corporate network applications. In doing so there are various challenges are in the
adaptive query processing where an efficient mechanism is required to distinguish
between the simple and complex queries. A mechanism is needed wherein complex
queries are channelized to be handled in a different manner. Also the storage of data for
the retrieval plays a vital role to provide the scale up factor for the queries. The designed
technologies have to be well implemented in the P2P environment. Unfortunately, the
previous works are far from optimal.
3.2 EXISTING SYSTEM

The corporate network needs to scale up to support thousands of participants, while the
installation of a large-scale centralized data warehouse system entails nontrivial costs
including huge hardware/software investments (total cost of ownership) and high
maintenance cost ( total cost of operations)
In the real world, most companies are not keen to invest heavily on additional
information systems until they can clearly see the potential return on investment (ROI).
Companies want to fully customize the access control policy to determine which business
partners can see which part of their shared data. Users of the system expect fast and
correct retrieval of data without any delay.
The delay is bound to occur as a standard mechanism which is not dynamic and adaptable
is used for processing of all queries.
3.2.1 DISADVANTGES OF THE EXISTING SYSTEM

• Most of the data warehouse solutions fail to offer flexibilities.
• Solution has not been designed to handle such dynamicity.


• Provided resources are not efficiently used.
• Simple queries also consume the same time that is required to process
complex queries.
3.3 PROPOSED SYSTEM

The unique challenges posed by sharing and processing data in an inter-businesses
environment is solved by this proposed system using Elastic data sharing services, by
integrating database, and peer-to-peer technologies.
The proposed system is enhanced with distributed access control, multiple types of
indexes for delivering elastic data sharing services. The software components of the
system are separated into two parts: core and the adapter.
o The core contains all the data sharing functionalities and is designed to be
platform independent.
o The adapter contains one abstract adapter describing the elastic
infrastructure service interface and a set of concrete adapter components
which implement such an interface through APIs provided.
The two level design helps us to achieve portability. With suitable adapters the system
can be easily implemented in the cloud as well as the non-cloud environment.
Every client with whom the network is to be used is registered and created an instance
making it an authorized entity to access information in the network.
The system accepts queries to be processed. As soon as a query arrives from an
authorized client, the query is analyzed for complexity and then processed using our
system if it is simple in nature else is sent using an interface to the system designed to
handle complex queries.
The processing also utilizes the peers involved to process the queries and has them to
send back results and the results are aggregated to be sent to the requested clients.
3.3.1 ADVANTAGES OF PROPOSED SYSTEM

• Our system can efficiently handle typical workloads in a corporate network and
can deliver near linear query throughput as the number of normal peers grows.


• Efficient query processing in terms of the typical queries which consist of the
majority of corporate queries.
• Adaptive query processing is supported which helps dealing with the

dynamicity of the queries resulting in high performance.
• Therefore, the proposed system is a promising solution for efficient data

sharing within corporate networks.


CHAPTER 4
SOFTWARE DESCRIPTION
4.1 Java
Java is an object-oriented multithread programming language. It is designed to be small,

simple and portable across different platforms as well as operating systems.
4.1.1 Features of java
Platform Independence
• The Write-Once-Run-Anywhere ideal has not been achieved (tuning for different
platforms usually required), but closer than with other languages.
Object Oriented
• Object oriented throughout - no coding outside of class definitions, including

main().
• An extensive class library available in the core language packages.
Compiler/Interpreter Combo
• Code is compiled to byte codes that are interpreted by a Java virtual machines
(JVM).
• This provides portability to any machine for which a virtual machine has been
written.
• The two steps of compilation and interpretation allow for extensive code checking
and improved security.
Robust
• Exception handling built-in, strong type checking (that is, all data must be


declared an explicit type), local variables must be initialized.
Automatic Memory Management
• Automatic garbage collection - memory management handled by JVM.
Security
• No memory pointers
• Programs run inside the virtual machine sandbox.
• Array index limit checking
• Code pathologies reduced by
§ Byte code verifier - checks classes after loading
§ Class loader - confines objects to unique namespaces. Prevents
loading a hacked "java.lang.SecurityManager" class, for example.
§ Security manager - determines what resources a class can access
such as reading and writing to the local disk.
Dynamic Binding
• The linking of data and methods to where they are located is done at run-time.
• New classes can be loaded while a program is running. Linking is done on the fly.
• Even if libraries are recompiled, there is no need to recompile code that uses
classes in those libraries.
• This differs from C++, which uses static binding. This can result in fragile classes
for cases where linked code is changed and memory pointers then point to the
wrong addresses.
Good Performance
• Interpretation of byte codes slowed performance in early versions, but advanced

virtual machines with adaptive and just-in-time compilation and other techniques
now typically provide performance up to 50% to 100% the speed of C++
programs.


Threading
• Lightweight processes, called threads, can easily be spun off to perform

multiprocessing.
• Can take advantage of multiprocessors where available
• Great for multimedia displays.
Built-in Networking
• Java was designed with networking in mind and comes with many classes to
develop sophisticated Internet communications.
IMP applications are called IMlets, but in reality they are MIDlets. They subclass MIDlet,
and follow the same packaging, deployment, security and life-cycle as MIDlets.
4.1.2 Connected Device Configuration
CDC is a smaller subset of Java SE, containing almost all the libraries that are not GUI
related.
4.1.3 Personal Basis Profile
Extends the Foundation Profile to include lightweight GUI support in the form of an
AWT subset.
4.1.4 Personal Profile
This extension of Personal Basis Profile includes a more comprehensive AWT subset and
adds applet support.
4.2 JAVASCRIPT
JavaScript is a lightweight, interpreted programming language. It is designed for creating

network-centric applications. It is complimentary to and integrated with Java. JavaScript
is very easy to implement because it is integrated with HTML. It is open and cross-


platform. JavaScript is a dynamic computer programming language. It is lightweight and
most commonly used as a part of web pages, whose implementations allow client-side
script to interact with the user and make dynamic pages. It is an interpreted programming
language with object-oriented capabilities.
4.2.1 Merits of using JavaScript
• Less server interaction − You can validate user input before sending the page off
to the server. This saves server traffic, which means less load on your server.
• Immediate feedback to the visitors − They don't have to wait for a page reload to
see if they have forgotten to enter something.
• Increased interactivity − You can create interfaces that react when the user hovers
over them with a mouse or activates them via the keyboard.
• Richer interfaces − You can use JavaScript to include such items as drag-and-
drop components and sliders to give a Rich Interface to your site visitors.
4.2.2 Limitations of JavaScript
• Client-side JavaScript does not allow the reading or writing of files. This has
been kept for security reason.
• JavaScript cannot be used for networking applications because there is no such

support available.
• JavaScript doesn't have any multithreading or multiprocessor capabilities.
4.3 APACHE TOMCAT SERVER
Tomcat is an application server designed to execute Java servlets and render web pages
that use Java Server page coding. Accessible as either a binary or a source code version,
Tomcat’s been used to power a wide range of applications and websites across the
Internet. At the time of writing, it’s definitely one of the more popular servlet containers


available.
4.3.1 Advantages of using Apache Tomcat to run our website’s Java

applications
• It’s Incredibly Lightweight

Even with JavaEE certification, Tomcat is an incredibly lightweight application. If
offers only the most basic functionality necessary to run a server, meaning it
provides relatively quick load and redeploy times compared to many of its peers,
which are bogged down with far too many bells and whistles. This lightweight
nature also allows it to enjoy a significantly faster development cycle.
• It’s Open-Source
Tomcat’s free, and the source code for the server is readily available to anyone
who’d care to download it. What this means is that – assuming we’re willing to
tinker with the moving parts of your server – you’ve got an incredible degree of
freedom insofar as what you want to do with a Tomcat installation.
• It’s Highly Flexible
Thanks to its lightweight nature and a suite of extensive, built-in customization
options, Tomcat is quite flexible. We can run it in virtually any fashion you
choose, and it’ll still work as intended. The fact that it’s open-source helps as
well, since you can tweak it to fit your needs, provided you’ve the knowledge to
do so.
• Server Will Be More Stable
Tomcat is an extremely stable platform to build on – and using it to run our
applications will contribute to your server’s stability, as well. This is because
Tomcat runs independently of our Apache installation – even if a significant
failure in Tomcat caused it to stop working, the rest of your server would run just
fine.
4.4 JAVA SERVER PAGE
JavaServer Pages (JSP) is a server-side programming technology that enables the creation


of dynamic, platform-independent method for building Web-based applications. JSP have
access to the entire family of Java APIs, including the JDBC API to access enterprise
databases. This tutorial will teach you how to use Java Server Pages to develop your web
applications in simple and easy steps.
JavaServer Pages often serve the same purpose as programs implemented using the
Common Gateway Interface (CGI). But JSP offer several advantages in comparison with
the CGI.
• Performance is significantly better because JSP allows embedding Dynamic

Elements in HTML Pages itself instead of having a separate CGI files.
• JSP are always compiled before it's processed by the server unlike CGI/Perl
requires the server to load an interpreter and the target script each time the page is
requested.
• JavaServer Pages are built on top of the Java Servlets API, so like Servlets, JSP
also has access to all the powerful Enterprise Java APIs, including JDBC, JNDI,
EJB, JAXP etc.
• JSP pages can be used in combination with servlets that handle the business logic,
the model supported by Java servlet template engines.
Finally, JSP is an integral part of Java EE, a complete platform for enterprise class
applications. This means that JSP can play a part in the simplest applications to the most
complex and demanding.
4.5 MYSQL
MySQL provides a implementation of a SQL database very well suited for small to
medium web pages. The database is free and open source with a commercial license
available (MySQL is now owned by Oracle after they bought Sun).
Common applications for MySQL include php and java based web applications that
require a DB storage backend, e.g. Dokuwiki, Joomla, xwiki etc. Very many applications
that use MySQL are geared towards the LAMP stack (Linux, Apache, MySQL, php).


4.5.1 Advantages Of Using MySQL
• It’s Easy To Use MySQL is very easy to install, and thanks to a bevy of third-
party tools that can be added to the database, setting up an implementation is a
relatively simple task. In addition, it’s also an easy database to work with. So long
as you understand the language, you shouldn’t run into too many problems.
• Support Is Readily Available Whenever Necessary Although Oracle’s history of
supporting its customers can be spotty at best, the nature of MySQL – which got
its start as an open-source platform – means that there’s a large and thriving
community of developers and enthusiasts to which one can turn for help. This is
due in large part to the popularity of the solution, the end result of which is no
shortage of experts.
• It’s Open-Source Oracle’s purchase of Sun Microsystems (and by association,
MySQL) was met with some contention from the development community. The
general fear was that Oracle would transform the tool into a closed, proprietary
ecosystem. Thankfully, though Oracle has tightened its grip on MySQL
somewhat, it can still be considered an open-source database option, as the code is
still available for free online.
• It’s Incredibly Inexpensive Depending on what you plan to use it for, a MySQL
implementation could range in price from free to $10,000 or more. Either way, it’s
significantly less expensive than most other database options on the market (save
for MySQL’s open-source competitors).
• It’s An Industry Standard Although MySQL’s popularity has waned somewhat in

recent years, it remains one of the most-used database systems in the world. It’s
compatible with virtually every operating system, and is more or less an industry
standard. This is, of course, in spite of all the folks who say it’s on the way out
4.6 HTML AND CSS
HTML, HyperText Markup Language, gives content structure and meaning by defining


that content as, for example, headings, paragraphs, or images. CSS, or Cascading Style
Sheets, is a presentation language created to style the appearance of content—using, for
example, fonts or colors.
The two languages—HTML and CSS—are independent of one another and should
remain that way. CSS should not be written inside of an HTML document and vice versa.
As a rule, HTML will always represent content, and CSS will always represent the
appearance of that content


CHAPTER 5
REQUIREMENT SPECIFICATION
HARDWARE REQUIREMENTS:
• System : Pentium IV 2.4 GHz.

• Hard Disk : 40 GB.
• Floppy Drive : 1.44 Mb.
• Monitor : 15 VGA Colour.
• Mouse : Logitech.
• Ram : 512 Mb.
SOFTWARE REQUIREMENTS:
• Operating system : Windows XP/7.

• Coding Language : JAVA/J2EE, HTML/CSS
• IDE : Netbeans 7.4
• Database : MYSQL


CHAPTER 6
DETAILED DESIGN DESCRIPTION

6.1 SYSTEM ARCHITECTURE
Fig. 6.1 The BestPeer++ network deployed on Cloud offering
The system implements following modules:

• Peer++ Processing Approach
• Parallel P2P Processing
• Implementing MapReduce
• Adaptive Query Processing


The system employs two query processing approaches: basic processing and adaptive
processing. The basic query processing strategy is similar to the one adopted in the
distributed databases domain. Overall, the query submit-ted to a normal peer P is
evaluated in two steps: fetching and processing. In the fetching step, the query is
decomposed into a set of sub-queries which are then sent to the remote normal peers that
host the data involved in the query (the list of these normal peers is determined by
searching the indices stored in BATON).
For each join, instead of forwarding all tuples into a single processing node, we
disseminate them into a set of nodes, which will process the join in parallel. We adopt the
conventional replicated join approach. Namely, the small table will be replicated to all
processing nodes and joined with a partition of the large table.
The main difference between MapReduce method and native P2P method comes from the
join processing. In MapReduce method, instead of doing replicate joins, the symmetric-
hash join approach is adopted. Each mapper reads in its local data and shuffles the
intermediate tuple according to the hash value of the join key.
For small jobs, the P2P engine performs better than the MapReduce engine, as it does not
incur initialization cost and database join algorithms have been well optimized. However,
for large-scale data analytic jobs, the MapReduce engine is more scalable, as it does not
incur recursive data replications. Based on the above-mentioned cost models, we propose
our adaptive query processing approach. When a query is submitted, the query planner
retrieves related histogram and index information from the bootstrap node, analyzes the
query and constructs a processing graph for the query
6.1.1 The BestPeer++ Core
The BestPeer++ core contains all platform-independent logic, including query processing
and P2P overlay. It runs on top of the Cloud adapter and consists of two software
components: bootstrap peer and normal peer. A BestPeer++ network can only have a
single bootstrap peer instance which is always launched and maintained by the BestPeer
++ service provider, and a set of normal peer instances. The architecture is depicted in
Fig. 6.1. This section briefly describes the functionalities of these two kinds of peer.
Individual components and data flows inside these peers are presented in the subsequent


sections.
Fig. 6.2 Offline data flow
Fig. 6.3 Online data flow
The bootstrap peer is the entry point of the whole network. It has several responsibilities.
First, the bootstrap peer serves for various administration purposes, including monitoring
and managing normal peers and also scheduling various network management events.
Second, the bootstrap peer acts as a central repository for meta data of corporate network
applications, including shared global schema, participant normal peer list, and role
definitions. In addition, BestPeer++ employs the standard PKI encryption scheme to
encrypt/decrypt data transmitted between normal peers in order to further increase the
security of the system. Thus, the bootstrap peer also acts as a certificate authority (CA)
center for certifying the identities of normal peers. Normal peers are the BestPeer++
instances launched by businesses. Each normal peer is owned and managed by an
individual business and serves the data retrieval requests issued by the users of the
owning business. To meet the high throughput requirement, BestPeer++ does not rely on
a centralized server to locate which normal peer hold which tables. Instead, the normal


peers are organized as a balanced tree peer-to-peer overlay based on BATON. The query
processing is, thus, performed in entirely a distributed manner.
6.1.2 BOOTSTRAP PEER
The bootstrap peer is run by the BestPeer++ service provider, and its main functionality is
to manage the BestPeer+ + network. This section presents how bootstrap peer performs
various administrative tasks.
Managing Normal Peer Join/Departure
Each normal peer intends to join an existing corporate network must first connect to the
bootstrap peer. If the join request is permitted by the service provider, the bootstrap peer
will put the newly joined peer into the peer list of the corporate network. At the same
time, the joined peer will receive the corporate network information including the current
participants, global schema, role definitions, and an issued certificate. When a normal
peer needs to leave the network, it also notifies the bootstrap peer first. The bootstrap
peer will move the departure peer to the black list and mark the certificate of the
departing peer invalid. The bootstrap peer will the reclaim all resources allocated to the
departing peer and finally remove the departing peer from the peer list.
Auto Fail-Over and Auto-Scaling
In addition to managing peer join and peer departure, the bootstrap peer spends most of
its running-time on monitoring the healthy of normal peers and scheduling fail-over and
auto-scaling events. Algorithm 1 shows how the daemon service of the bootstrap works.
The bootstrap periodically collects performance metrics of each normal peer. If some
peers are malfunctioned or crashed, the bootstrap peer will trigger an automatic fail-over
event for each failed normal peer. The bootstrap peer asks the newly launched instance to
perform database recovery from the latest database backup stored. Finally, the failed peer
is put into the blacklist. Similarly, if any normal peer is overloaded (e. g., CPU is over-
utilized or free storage space is low), the bootstrap peer triggers an auto-scaling event to


either upgrade the normal peer to a larger instance or allocate more storage spaces. At the
end of each maintenance epoch, the bootstrap releases the resources in the blacklist and
notifies the changes to all participants.
In a data sharing platform like our system, enforcing system’s consistency guarantee is a
crucial but difficult task. An important issue is the consistency of the whole system when
there are node failures, more specifically how queries can be executed in these situations.
Business applications rely on accurate summarization of data, and thus may suffer from
any form of data inconsistency. Therefore, the widely used eventual consistency model or
other weakened consistency models do not fit in our case. In BestPeer++, we opt to
enforce strong consistency by guaranteeing that all necessary data in a business scope is
online at query time. When a node crashes, all affected queries need to be blocked until
the auto fail-over process is completed. We are able to provide correctness and
consistency guarantee in this way at the expense of some latency. However, given that the
recovery time complies with SLA’s constraint, this latency is restrained within an
acceptable range.
6.1.3 NORMAL PEER
The normal peer software consists of five components: schema mapping, data loader, data
indexer, access control, and query executor. We present the first four components in this
section. Query processing in the system will be presented in the next section.
There are two data flows inside the normal peer: an offline data flow and an online data
flow. In the offline data flow, the data are extracted periodically by a data loader from the
business production system to the normal peer instance. In particular, the data loader
extracts the data from the business production system, transforms the data format from its
local schema to the shared global schema of the corporate network according to the
schema mapping, and finally stores the results in the MySQL databases hosted in the
normal peer.
In the online data flow, user queries are submitted to the normal peer and then processed
by the query processor. The query processor performs user queries using a fetch and
process strategy. The query processor first parses the query and then employs the


BATON search algorithm to identify the peers that hold the data related to the query.
Schema Mapping
Schema mapping is a component that defines the mapping between the local schema of
each production system and the global shared schema employed by the corporate
network. Currently, the system only supports relational schema mapping, namely both
local schema and the global schema are relational. The mapping consists of metadata
mappings (i.e., mapping local table definitions to global table definitions) and value
mappings (i. e., mapping local terms to global terms). Besides schema- level mapping, the
system can also support instance- level mapping, which complements the mapping
process when there is not sufficient schema information. In general, the schema mapping
process requires human to be involved and is rather time consuming. However, it only
needs to perform once. Furthermore, the system adopts templates to facilitate the
mapping process. Specifically, for each popular production system, we provide a
mapping template which defines the transformation of local schemas of those systems to
a global schema. What the business only needs is to modify the mapping template to meet
its own needs. We found that this mapping template approach works well in practice and
significantly reduces the service setup efforts.
Data Loader
Data Loader is a component that extracts data from production systems to normal peer
instances according to the result of schema mapping. While the process of extracting and
transforming data is straightforward, the main challenge comes from maintaining
consistency between raw data stored in the production systems and extracted data stored
in the normal peer instance (and subsequently data indices created from these extracted
data) while the raw data being updated inside the production systems. We solve the
consistency problem by the following approach. When the data loader first extracts data
from the production system, besides storing the results in the normal peer instance, the
data loader also creates a snapshot of the newly inserted data. After that, at interval times,
the data loader reextracts data from the production system to create a new snapshot. This
snapshot is then compared to the previously stored one to detect data changes. Finally, the


changes are used to update the MySQL database hosted in the normal peer.
Given two consecutive data snapshots, we employ a similar algorithm as the one
proposed in. In our algorithm, the system first fingerprints every tuple of the tables in the
two snapshots to a unique integer.
Data Indexer
In our system, the data are stored in the local MySQL database hosted by each normal
peer. Thus, to process a query, we need to locate which normal peers host the tables
involved in the query. For example, to process a simple query like select R.a from R
where R.b=x, we need to know the location of the peers store tuples belonging to the
global table R.
We adopt the peer-to-peer technology to solve the data locating problem and only send
queries to normal peers which host related data. In particular, we employ BATON, a
balanced binary tree overlay protocol to organize all normal peers. Given a value domain
[L, U], each node in BATON is responsible for two ranges.
If we traverse the tree via in-order, we can access the values in consecutive domains. In
BATON, each node maintains log2N routing neighbors in the same level, which are used
to facilitate the search process in this index structure. To achieve a balanced structure,
BATON employs two flexible load balancing schemes. A node can balance its load with
adjacent nodes when there exists underloaded ones. However, in the case that there is no
adjacent node available for load balancing, BATON performs a global adjustment by
moving a non-adjacent leaf node from its original position to the overloaded region to
share load. Since BATON organizes nodes as a balanced tree, such a scheme could incur
network restructuring.
The system employs replication of index data in the BATON structure to ensure the
correct retrieval of index data in the presence of failures. Specifically, we use the two-tier
partial replication strategy to provide both data availability and load balancing, as
proposed in our recent study.


Distributed Access Control
The access to multi-businesses data shared in a corporate network needs to be controlled

properly. The challenge is for BestPeer++ to provide a flexible and easy-to-use access
control scheme for the whole system; at the same time, it should enable each business to
decide the users that can access its shared data in the inherent distributed environment
of corporate networks. BestPeer++ develops a distributed role-based access control
scheme. The basic idea is to use roles as templates to capture common data access
privileges and allow businesses to override these privileges to meet their specific needs.
The roles are maintained locally and used in the query processing to rewrite the queries.
Specifically, given a query Q submitted by user u, the query processor will send the data
retrieval request to the involved peers. The peer, upon receiving the request, will
transform it based on u’s access role. The data that cannot be accessed by u will not be
returned. For example, if a user assigned to Rolesale tries to retrieve all tuples from
lineitem, the peer will only return values from two columns: extendedprice and shipdate.
Note that system does not collect the information of existing users in the collaborating
ERP databases, since it will lead to potential security issues. Instead, the user
management module of system provides interfaces for the local administrator at each
participating organization to create new accounts for users who desire to access system’s
service.
6.2 DATA FLOW DIAGRAM

The DFD is also called as bubble chart. It is a simple graphical formalism that can be
used to represent a system in terms of input data to the system, various processing carried
out on this data, and the output data is generated


6.2.1 USE CASE DIAGRAM
A use case diagram in the Unified Modeling Language (UML) is a type of behavioral
diagram defined by and created from a Use-case analysis. Its purpose is to present a
graphical overview of the functionality provided by a system in terms of actors, their
goals (represented as use cases), and any dependencies between those use cases. The
main purpose of a use case diagram is to show what system functions are performed for
which actor. Roles of the actors in the system can be depicted.
Fig. 6.2 Use Case diagram

6.2.2 SEQUENCE DIAGRAM
A sequence diagram in Unified Modeling Language (UML) is a kind of interaction
diagram that shows how processes operate with one another and in what order. It is a
construct of a Message Sequence Chart. Sequence diagrams are sometimes called event
diagrams, event scenarios, and timing diagrams.


Fig. 6.3 Sequence diagram of system
6.2.3 ACTIVITY DIAGRAM
Activity diagrams are graphical representations of workflows of stepwise activities and

actions with support for choice, iteration and concurrency. In the Unified Modeling
Language, activity diagrams can be used to describe the business and operational step-by-
step workflows of components in a system. An activity diagram shows the overall flow of
control.


Fig 6.4 activity diagram of system
6.3 INPUT DESIGN

The input design is the link between the information system and the user. It comprises the
developing specification and procedures for data preparation and those steps are
necessary to put transaction data in to a usable form for processing. Eclipse plug-ins from
java libraries (JAR files) are used for this purpose. We have used Swing (GUI widget
toolkit) to provide efficient user interface


The design of input focuses on controlling the amount of input required, controlling the
errors, avoiding delay, avoiding extra steps and keeping the process simple. The input is
designed in such a way so that it provides security and ease of use with retaining the
privacy. Input Design considered the following things:
• What data should be given as input?
• How the data should be arranged or coded?
• The dialog to guide the operating personnel in providing input.
• Methods for preparing input validations and steps to follow when error occur.
6.3.1 OBJECTIVES
Input Design is the process of converting a user-oriented description of the input into a
computer-based system. This design is important to avoid errors in the data input process
and show the correct direction to the management for getting correct information from
the computerized system.
It is achieved by creating user-friendly screens for the data entry to handle large volume
of data. Swing has been used for this purpose in the project. The goal of designing input
is to make data entry easier and to be free from errors. The data entry screen is designed
in such a way that all the data manipulates can be performed. It also provides record
viewing facilities.
Eclipse IDEs provides efficient language toolkit for converting user description to
computer based system.
When the data is entered it will check for its validity. Data can be entered with the help of
screens. Validation can be invoked manually on a resource or group of resources
including XML, DTD and XML Schemas
6.4 OUTPUT DESIGN

A quality output is one, which meets the requirements of the end user and presents the
information clearly. In any system results of processing are communicated to the users
and to other system through outputs. In output design it is determined how the
information is to be displaced for immediate need and also the hard copy output. It is the


most important and direct source information to the user. Efficient and intelligent output
design improves the system’s relationship to help user decision-making.
• Designing computer output should proceed in an organized, well thought out
manner; the right output must be developed while ensuring that each output
element is designed so that people will find the system can use easily and
effectively. When analysis design computer output, they should Identify the
specific output that is needed to meet the requirements.
• Select methods for presenting information.
• Create document, report, or other formats that contain information produced by
the system.
6.4.1 OBJECTIVES
• Convey information about past activities, current status or projections of the
Future
• Trigger an action.
• Confirm an action.


6.5 Algorithms
The algorithms used BootStrapDaemon() algorithm and adaptive query processing

algorithm.


6.6 A Parallel P2P Processing Approach
The idea of parallel processing is shown in Fig. 6.5. For each join, instead of forwarding
all tuples into a single processing node, we disseminate them into a set of nodes, which
will process the join in parallel. We adopt the conventional replicated join approach.
Namely, the small table will be replicated to all processing nodes and joined with a
partition of the large table. For example, in Fig. 6.5, table S is replicated to two nodes and
joined with the partitions of R (R1 and R2). When a query involves multiple joins and
group by, the query plan can be expressed as a processing graph.
Given a query Q, the processing Graph G 1⁄4 ðV;
EÞ is generated as follows:


1. For each node vi 2 V, we assign a level ID to vi, denoted as f ðvi Þ.
2. Root node v0 represents the peer that accepts the query, which is responsible for
collecting the results for the user. fðv0Þ 1⁄4 0.
3. Suppose Q involves x joins and y “Group By” attributes, the maximal level of the
graph L satisfies L xþfðyÞ(fðyÞ1⁄41,ify!1.OtherwisefðyÞ1⁄40).In this way, we generate a
level of nodes for each join operator and the “Group By” operator.
4. Except for the root node, all other nodes only process one join operator or the “Group
By” operator.
5. Nodes of level L accept input data from the Best- Peer++’s storage system (e.g., local
databases). After completing its processing, node vi sends its data to the nodes in
level fðvi Þ 1.
6. All of operators that are not evaluated in the non-root node are processed by the root.
In the replicated join, we trade off the network cost (a table is replicated to multiple
nodes) for the parallel- ism. The benefit may be neutralized when a large number of
tuples are re-partitioned in the P2P network. Therefore, we propose a model to estimate
the cost. The parameters used in the model are summarized in Table 3.
The intermediate result from level iþ1 needs to be broadcasted to all of the tðTiÞ
partitions of table Ti involving in level i’s join. In this cost model, we assume that the I/O
(local and network communication) and the CPU time dominate the overall cost. First, we
define the workload of ith replicated join as the product of last step’s workload and the
number of partition of Table Ti:


Fig. 6.5 Parallel P2P processing
Table 6.1 Notations for cost modelling


CHAPTER 7
TESTING
7.1 PRIVACY MAINTAINING WEB SEARCH WITH
SENSITIVITY
After finishing the development of any computer based system the next complicated time
consuming process is system testing. During the time of testing only the development
company can know that, how far the user requirements have been met out, and so on.
Software testing is an important element of the software quality assurance and represents
the ultimate review of specification, design and coding. The increasing feasibility of
software as a system and the cost associated with the software failures are motivated
forces for well planned through testing.
Testing Objectives
These are several rules that can save as testing objectives they are:
• Testing is a process of executing program with the intent of finding an error.

• A good test case is one that has a high probability of finding an undiscovered error.
Testing procedures for the project is done in the following sequence
• System testing is done for checking the server name of the machines being
connected between the customer and executive..
• The product information provided by the company to the executive is
tested against the validation with the centralized data store.
• System testing is also done for checking the executive availability to
connected to the server.
• The server name authentication is checked and availability to the customer
• Proper communication chat line viability is tested and made the chat
system function properly.


• Mail functions are tested against the user concurrency and customer mail
date validate.
• Following are the some of the testing methods applied to this effective
project:
7.2 SOURCE CODE TESTING
This examines the logic of the system. If we are getting the output that is required by the
user, then we can say that the logic is perfect.
7.3 SPECIFICATION TESTING:
We can set with, what program should do and how it should perform under various
condition. This testing is a comparative study of evolution of system performance and
system requirements.
7.4 MODULE LEVEL TESTING:
In this the error will be found at each individual module, it encourages the programmer to
find and rectify the errors without affecting the other modules.
7.5 UNIT TESTING:
Unit testing focuses on verifying the effort on the smallest unit of software-module. The
local data structure is examined to ensure that the date stored temporarily maintains its
integrity during all steps in the algorithm’s execution. Boundary conditions are tested to
ensure that the module operates properly at boundaries established to limit or restrict
processing.
7.6 INTEGRATION TESTING:
Data can be tested across an interface. One module can have an inadvertent, adverse
effect on the other. Integration testing is a systematic technique for constructing a
program structure while conducting tests to uncover errors associated with interring.


7.7 VALIDATION TESTING:
It begins after the integration testing is successfully assembled. Validation succeeds when
the software functions in a manner that can be reasonably accepted by the client. In this
the majority of the validation is done during the data entry operation where there is a
maximum possibility of entering wrong data. Other validation will be performed in all
process where correct details and data should be entered to get the required results.
7.8 RECOVERY TESTING:
Recovery Testing is a system that forces the software to fail in variety of ways and
verifies that the recovery is properly performed. If recovery is automatic, re-
initialization, and data recovery are each evaluated for correctness.
7.9 SECURITY TESTING:
Security testing attempts to verify that protection mechanism built into system will in fact
protect it from improper penetration. The tester may attempt to acquire password through
external clerical means, may attack the system with custom software design to break
down any defenses to others, and may purposely cause errors.
7.10 PERFORMANCE TESTING:
Performance Testing is used to test runtime performance of software within the context of
an integrated system. Performance test are often coupled with stress testing and require
both software instrumentation.
7.11 BLACKBOX TESTING:
Black- box testing focuses on functional requirement of software. It enables to derive ets
of input conditions that will fully exercise all functional requirements for a program.
Black box testing attempts to find error in the following category:
• Incorrect or missing function


• Interface errors
• Errors in data structures or external database access and performance
errors.
7.12 OUTPUT TESTING:
After performing the validation testing, the next step is output testing of the proposed
system since no system would be termed as useful until it does produce the required
output in the specified format. Output format is considered in two ways, the screen format
and the printer format.
7.13 USER ACCEPTANCE TESTING:
User Acceptance Testing is the key factor for the success of any system. The system
under consideration is tested for user acceptance by constantly keeping in touch with
prospective system users at the time of developing and making changes whenever
required.


CHAPTER 8
PSEUDOCODE
adminhome.jsp
<%@ page language="java" contentType="text/html; charset=ISO-8859-1"

pageEncoding="ISO-8859-1"%>
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"

"http://www.w3.org/TR/html4/loose.dtd">
<%@taglib uri="/struts-tags" prefix="s"%>
<%@taglib uri="/struts-jquery-tags" prefix="sj"%>
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>Car Dealer</title>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />
<link href="stylesheet.css" rel="stylesheet" type="text/css" />
<script type="text/javascript">
function callAdminLogin(){
var user=document.getElementById("username").value;
var pass=document.getElementById("password").value;
if(user=="admin" && pass=="admin"){
document.cloudForm.action='admin_home.action';
document.cloudForm.submit();


}else{
alert("Authentication Failed ");
function callDealerRegister(){
document.cloudForm.action='goto_register_page.action';
function callSearch(){
document.cloudForm.action='search_page.action';
</script>
</head>
<body>
<form name="cloudForm" id="cloudForm" method="post">
<div id="layout">
<div id="titlebg">
<div class="title">CAR</div>


<div class="title1">DEALER</div>
<div class="title2">WE HAVE</div>
<div class="title3">THE BEST CARS</div>
<div class="title4" align="center"> User <a target="_blank"

href="http://jigsaw.w3.org/css-validator/" class="title4">Administrator</a></div>
</div>
<hr id="hrline" />
<div id="gradientbg">
<div id="links">
<div id="arrow"></div>
<div class="linktxt"> <a href="#" class="linktxt">HOME</a> </div>
<div id="arrowabout"></div>
<div class="linktxt"><a href="#" class="linktxt"

onclick="callDealerRegister()">DEALER</a> </div>
<div id="arrowservice"></div>
<div class="linktxt"><a href="#" onclick="callSearch()"

class="linktxt">SEARCH</a></div>
<div id="arrowcatalogue"></div>
<div class="linktxt"><a href="#" class="linktxt">SALES</a></div>
<div id="arrowcontact"></div>
<div class="linktxt"><a href="#" class="linktxt">PURCHASE</a></div>


</div>
<div id="header"></div>
</div>
<div id="bodypart">
<div id="catalogue">
<div class="cat">CATALOGUE</div>
<div id="cat1"></div>
<div id="car1"></div>
</div>
<div id="line"></div>
<div id="ourservices">
<div class="our">OUR SERVICES</div>
<div id="pic1"></div>
<div class="lorem">Lorem ipsum</div>
<div class="txt">service </div>
</div>
<div id="line1"></div>
<div id="aboutus">


<div class="abt">ABOUT US</div>
<div class="lorem1">Lorem ipsum</div>
<div class="txt1">service </div>
</div>
</div>
<hr id="hrline1" />
<div class="foottxt">Copyright © Cars. Design by <a

href="http://www.dotcominfoway.com/" class="foottxt">DOT COM
INFOWAY</a>.</div>
</div>
<div align=center>This template downloaded form <a href='#'>free website

templates</a></div>
</form>
</body>
</html>
register.jsp


<head>
}else{


function registerSubmit(){
document.cloudForm.action='register.action';
function callSearch(){
document.cloudForm.action='search_page.action';
</script>
</head>
<body>
<div id="layout">
<div id="titlebg">
<div class="title4" align="center"> User <a target="_blank"

href="http://jigsaw.w3.org/css-validator/" class"title4">Administrator</a></div>
</div>
<hr id="hrline" />



<div id="links">
<div class="linktxt"> <a href="/" class="linktxt">HOME</a> </div>
<div class="linktxt"><a href="#" class="linktxt"

onclick="callDealerRegister()">DEALER</a> </div>
<div class="linktxt"><a href="#" onclick="callSearch()"

class="linktxt">SEARCH</a></div>
<div class="linktxt"><a href="#" class="linktxt">SALES</a></div>
<div class="linktxt"><a href="#" class="linktxt">PURCHASE</a></div>
</div>
</div>
<table width="100%">
<tr>
<td align="center">


<H4 style="color: white;">
<s:if test="%{msg!=null}">
<s:property value="%{msg}"/>
</s:if>
</H4>
<h3 style="color: white;">Create dealer Form</h3>
<s:form name="cloudForm" id="cloudForm" theme="simple" method="post">
<table class="normaltxt">
<tr>
<td> Name</td>
</tr>
<tr>
<td><s:textfield name="user.firstname"
id="firstname" cssClass="required input_field"></s:textfield> </td>
</tr>
<tr>
<td> Email</td>


</tr>
<tr>
<td><s:textfield name="user.email" id="email"

cssClass="required input_field"></s:textfield> </td>
</tr>
<tr>
<td> Username</td>
</tr>
<tr>
<td><s:textfield name="user.username"
id="username" cssClass="required input_field"></s:textfield> </td>
</tr>
<tr>
<td>Password</td>
</tr>
<tr>
<td><s:password name="user.password"
cssClass="required input_field" id="password"></s:password></td>
</tr>
<tr>


<td>Retype Password</td>
</tr>
<tr>
<td><s:password name="user.repassword"
cssClass="required input_field" id="repassword"></s:password></td>
</tr>
<tr>
<td> Mobile</td>
</tr>
<tr>
<td><s:textfield name="user.mobile" id="mobile"

cssClass="required input_field"></s:textfield> </td>
</tr>
<tr>
<td> Show room name</td>
</tr>
<tr>
<td><s:textfield name="user.showRoomName"
id="showRoomName" cssClass="required input_field"></s:textfield> </td>
</tr>
<tr>


<td> Show room location</td>
</tr>
<tr>
<td><s:textfield name="user.showRoomLocation"
id="showRoomLocation" cssClass="required input_field"></s:textfield> </td>
</tr>
<tr>
<td> Show room address</td>
</tr>
<tr>
<td><s:textarea name="user.showRoomAddress"
id="showRoomAddress" cssClass="required input_field"></s:textarea></td>
</tr>
<tr>
<td>Show room city</td>
</tr>
<tr>
<td><s:textfield name="user.showRoomCity"
id="showRoomCity" cssClass="required input_field"></s:textfield> </td>
</tr>


<tr>
<td>Deposit Amount</td>
</tr>
<tr>
<td><s:textfield name="user.depositAmount"
id="depositAmount" cssClass="required input_field"></s:textfield> </td>
</tr>
<tr>
<td colspan="2" style="padding-top: 20px;">
<input type="button"
onclick="registerSubmit()" class="submit_btn" value="Register">
</td>
</tr>
</table>
</s:form>
</td>
</tr>
</table>
</div>
<hr id="hrline1" />


INFOWAY</a>.</div>
<div align=center>This template downloaded form <a href='#'>free website

templates</a></div>
</body>
</html>
login.jsp

<head>


}else{
function callDealerLogin()
document.cloudForm.action='goto_login_page.action';
function loginSubmit(){
document.cloudForm.action='login.action';
</script>
</head>
<body>
<div id="layout">


<div id="titlebg">
<div class="title4" align="center"> Links: <a target="_blank"

href="http://jigsaw.w3.org/css-validator/" class="title4">CSS VALIDATOR</a></div>
</div>
<hr id="hrline" />
<div id="links">
<div class="linktxt"> <a href="/" class="linktxt">HOME</a> </div>
<div class="linktxt"><a href="#" class="linktxt">DEALER</a> </div>
<div class="linktxt"><a href="#" class="linktxt">SERVICES</a></div>
<div class="linktxt"><a href="#" onclick="callDealerLogin()"

class="linktxt">LOGIN</a></div>


<div class="linktxt"><a href="#" class="linktxt">CONTACTS</a></div>
</div>
</div>
<div id="bodypart">
<div id="catalogue">
<div class="cat">CATALOGUE</div>
</div>
<div id="line"></div>
<div id="ourservices">
<div class="our">OUR SERVICES</div>
<div class="lorem">Lorem ipsum</div>
<div class="txt">service </div>
</div>
<div id="line1"></div>


<div id="aboutus">
<div class="abt">Dealer Login</div>
<span style="color: white;">
<s:if test="%{msg!=null}">
<s:property value="%{msg}"/>
</s:if>
</span>
<s:form name="cloudForm" id="cloudForm" theme="simple" method="post">
<table>
<tr>
<td class="normaltxt"> Username</td>
</tr>
<tr>
<td><s:textfield name="user.username"
id="username"></s:textfield> </td>


</tr>
<tr>
<td class="normaltxt">Password</td>
</tr>
<tr>
<td><s:password name="user.password"
id="password"></s:password></td>
</tr>
<tr>
<td colspan="2" style="padding-top: 20px;">
<input type="button"
onclick="loginSubmit()" class="submit_btn" value="Login">
</td>
</tr>
</table>
</s:form>
</div>
</div>
<hr id="hrline1" />

INFOWAY</a>.</div>


</div>
<div align=center> template <a href='#'>free website templates</a></div>
</body>
</html>
databaseConncetion.java
package com.peer.database;
import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.SQLException;
import com.peer.utility.StaticInfo;
public class DatabaseConnection {
public Connection getConnection() throws ClassNotFoundException,

SQLException{
String driverClassName = "com.mysql.jdbc.Driver";
String url = StaticInfo.url;
String username = StaticInfo.dbUser;
String password = StaticInfo.dbPass;
Class.forName(driverClassName);
return DriverManager.getConnection(url, username, password);


user.java
package com.peer.bo;
public class User {
private int id;
private String firstname;
private String username;
private String email;
private String createdOn;
private boolean deleted;
private String mobile;
private String password;
private String repassword;
private String ipaddress;
private boolean connect;
public int getId() {
return id;


public void setId(int id) {
this.id = id;
public String getFirstname() {
return firstname;
public void setFirstname(String firstname) {
this.firstname = firstname;
public String getUsername() {
return username;
public void setUsername(String username) {
this.username = username;
public String getEmail() {
return email;


public void setEmai(String email) {
this.email = email;
public boolean isDeleted() {
return deleted;
public void setDeleted(boolean deleted)
this.deleted = deleted;
public String getMobile() {
return mobile;
public void setMobile(String mobile) {
this.mobile = mobile;
public String getPassword() {
return password;
public void setPassword(String password) {


this.password = password;
public String getRepassword() {
return repassword;
public void setRepassword(String repassword) {
this.repassword = repassword;
public String getCreatedOn() {
return createdOn;
public void setCreatedOn(String createdOn) {
this.createdOn = createdOn;
public String getIpaddress() {
return ipaddress;
public void setIpaddress(String ipaddress) {
this.ipaddress = ipaddress;


}
public boolean isConnect() {
return connect;
public void setConnect(boolean connect) {
this.connect = connect;
request.java
package com.peer.bo;
public class Request {
private int id;
private String username;
private String service;
private int userId;
private String requestOn;
public String getUsername() {


return username;
public void setUsername(String username) {
this.username = username;
public String getService() {
return service;
public void setService(String service) {
this.service = service;
public int getUserId() {
return userId;
public void setUserId(int userId) {
this.userId = userId;
public String getRequestOn() {
return requestOn;


public void setRequestOn(String requestOn) {
this.requestOn = requestOn;
public int getId() {
return id;
public void setId(int id) {
this.id = id;
peerReceiver.java
package com.peer.utility;
import java.io.ObjectInputStream;
import java.net.ServerSocket;
import java.net.Socket;
import com.peer.bo.Packet;
import com.peer.ui.PeerHome;
public class PeerReceiver extends Thread{
Socket socket=null;
ServerSocket serverSocket=null;


ObjectInputStream ois=null;
Sender sender=new Sender();
public PeerReceiver(int port) {
try{
serverSocket=new ServerSocket(port);
start();
}catch (Exception e) {
e.printStackTrace();
public void run(){
try{
while(true){
socket=serverSocket.accept();
ois=new ObjectInputStream(socket.getInputStream());
Object obj=ois.readObject();
if(obj instanceof Packet){
Packet packet=(Packet) obj;
if(packet.result.equalsIgnoreCase("success")){


StaticInfo.msg("Your Database Instance
successfully connected");
PeerHome.textField1.setEditable(false);
PeerHome.textField2.setEditable(false);
PeerHome.button1.setEnabled(false);
PeerHome.button1.setLabel("Connected");
}else{
StaticInfo.msg("Connection failure enter

correct credentials");
sender.java


package com.peer.utility;
import java.io.ObjectOutputStream;
import java.net.Socket;
import com.peer.bo.Packet;
public class Sender {
Socket socket=null;
ObjectOutputStream oos=null;
public void send(String sysname,int port,Packet obj){
try{
socket=new Socket(sysname,port);
oos=new ObjectOutputStream(socket.getOutputStream());
oos.writeObject(obj);
userDAO.java
package com.peer.dao;
import java.sql.CallableStatement;


import java.sql.Connection;
import java.sql.ResultSet;
import java.sql.SQLException;
import java.sql.Statement;
import java.util.ArrayList;
import com.peer.bo.Request;
import com.peer.bo.User;
import com.peer.database.DatabaseConnection;
import com.peer.utility.StaticInfo;
public class UserDAO {
/*'check_username_already_exists'
'get_user_details_for_user_id'
'get_user_id_for_user_name'
'insert_new_user'
'list_users'
'check_login'*/
DatabaseConnection dbCon=new DatabaseConnection();
Connection conn=null;
ResultSet rs=null;
Statement stmt = null;


public boolean checkUsernameAlreadyExists(String username){
boolean status=false;
try {
Connection con = conn=dbCon.getConnection();
CallableStatement call = con.prepareCall("{call

check_username_already_exists(?)}");
call.setString(1,username);
rs = call.executeQuery();
if(rs.next()){
status=true;
con.close();
} catch (SQLException e) {
} catch (ClassNotFoundException e) {
return(status);
public boolean checkLogin(String username,String password){
boolean status=false;


try {

check_login(?,?)}");
call.setString(2,password);
if(rs.next()){
status=true;
con.close();
return(status);
public User getUserDetailsForUserId(int userId){
User user=new User();
try {



get_user_details_for_user_id(?)}");
call.setInt(1,userId);
if(rs.next()){
user.setId(rs.getInt("id"));
user.setEmail(rs.getString("email"));
user.setUsername(rs.getString("username"));
user.setFirstname(rs.getString("firstname"));
user.setMobile(""+rs.getString("mobile"));
user.setCreatedOn(rs.getString("created_on"));
user.setIpaddress(rs.getString("ipaddress"));
if(rs.getString("is_connect").equalsIgnoreCase("N")){
user.setConnect(false);
}else{
user.setConnect(true);
con.close();


return(user);
public User getUserDetailsForUsername(String username){
try {

get_user_details_for_username(?)}");
if(rs.next()){


}else{
con.close();
return(user);
public void insertNewUser(String firstname,String email,String username,String

password,String mobile,String ipaddress){
try {
CallableStatement call = con.prepareCall("{call insert_user(?,?,?,?,?,?)}");


call.setString(1,firstname);
call.setString(2,email);
call.setString(4,password);
call.setString(5,mobile);
call.setString(6,ipaddress);
boolean update = call.execute();
con.close();
public void updateConnect(String username){
try {
CallableStatement call = con.prepareCall("{callupdate_connect(?)}");
boolean update = call.execute();


con.close();
public void insertNewDatabase(String dbName){
try {
stmt = conn.createStatement();
String sql = "CREATE DATABASE "+dbName;
stmt.executeUpdate(sql);
con.close();


public ArrayList<User> listUsers(){
ArrayList<User> userList=new ArrayList<User>();
try {
CallableStatement call = con.prepareCall("{call list_users()}");
while(rs.next()){
}else{


userList.add(user);
con.close();
return(userList);
public ArrayList<Request> listRequest(){
ArrayList<Request> reqList=new ArrayList<Request>();
try {
CallableStatement call = con.prepareCall("{call list_request()}");
while(rs.next()){
Request request=new Request();
request.setId(rs.getInt("id"));
request.setUsername(rs.getString("req_username"));


request.setUserId(rs.getInt("req_user_id"));
request.setRequestOn(rs.getString("req_on"));
request.setService(rs.getString("req_service"));
reqList.add(request);
con.close();
catch (ClassNotFoundException e) {
return(reqList);


CHAPTER 9
RESULTS
a) Admin Login


b) User List
c) Peer Home


d) Home Screen
e) Create User


f) List of Users
g) Peer Home conncetion


h) Create User
i) Admin home


j) Create user request received
k) Service Request list


CHAPTER 10
CONCLUSION
We have discussed the unique challenges posed by sharing and processing data in an
inter-businesses environment and proposed a system which delivers elastic data sharing
services, by integrating database, and peer-to-peer technologies. The benchmark
conducted on our system when implemented on the cloud in real time environment shows
that our system can efficiently handle typical workloads in a corporate network and can
deliver near linear query throughput as the number of normal peers grows. Therefore, our
system is a promising solution for efficient data sharing within corporate networks.
9.1 SCOPE FOR FUTURE WORK

To enhance the usability of conventional P2P networks, database community have
proposed a series of PDBMS by integrating the state-of-art database techniques into the
P2P systems. The PDBMS consist of structured and unstructured systems. For future
work, Optimization of the PDBMS can be considered in terms of overcoming problems
related to it such as no guarantee for the data retrieval, performance and result quality.
We can also use better technologies for the connection of the peers such that the network
is more robust, fault tolerant and self learning in nature.


CHAPTER 11
REFERENCES
[1] Base Paper: BestPeer++: A Peer-to-Peer Based Large-Scale Data Processing Platform
Gang Chen, Tianlei Hu, Dawei Jiang, Peng Lu, Kian-Lee Tan, Hoang Tam Vo, and Sai
Wu
[2] K. Aberer, A. Datta, and M. Hauswirth, “Route Maintenance Overheads in DHT

Overlays,” in 6th Workshop Distrib. Data Struct., 2004.
[3] A. Abouzeid, K. Bajda-Pawlikowski, D.J. Abadi, A. Rasin, and A. Silberschatz,

“HadoopDB: An Architectural Hybrid of MapReduce and DBMS Technologies for
Analytical Workloads,” Proc. VLDB Endowment, vol. 2, no. 1, pp. 922-933, 2009.
[4] C. Batini, M. Lenzerini, and S. Navathe, “A Comparative Analysis of Methodologies

for Database Schema Integration,” ACM Computing Surveys, vol. 18, no. 4, pp. 323-364,
1986.
[5] H. Garcia-Molina and W.J. Labio, “Efficient Snapshot Differential Algorithms for
Data Warehousing,” technical report, Stanford Univ., 1996.
[6] R. Huebsch, J.M. Hellerstein, N. Lanham, B.T. Loo, S. Shenker, and I. Stoica,
“Querying the Internet with PIER,” Proc. 29th Int’l Conf. Very Large Data Bases, pp.
321-332, 2003.
[7] H.V. Jagadish, B.C. Ooi, K.-L. Tan, Q.H. Vu, and R. Zhang, “Speeding up Search in
Peer-to-Peer Networks with a Multi-Way Tree Structure,” Proc. ACM SIGMOD Int’l
Conf. Management of Data, 2006.
[8] H.V. Jagadish, B.C. Ooi, K.-L. Tan, C. Yu, and R. Zhang, “iDistance: An Adaptive
B+-Tree Based Indexing Method for Nearest Neighbor Search,” ACM Trans. Database
Systems, vol. 30, pp. 364-397, June 2005.


[9] H.V. Jagadish, B.C. Ooi, and Q.H. Vu, “BATON: A Balanced Tree Structure for
Peer-to-Peer Networks,” Proc. 31st Int’l Conf. Very Large Data Bases (VLDB ’05), pp.
661-672, 2005.
[10] A. Lakshman and P. Malik, “Cassandra: Structured Storage System on a P2P

Network,” Proc. 28th ACM Symp. Principles of Distributed Computing (PODC ’09), p.
5, 2009.
[11] W.S. Ng, B.C. Ooi, K.-L. Tan, and A. Zhou, “PeerDB: A P2P-Based System for
Distributed Data Sharing,” Proc. 19th Int’l Conf. Data Eng., pp. 633-644, 2003.
[12] V. Poosala and Y.E. Ioannidis, “Selectivity Estimation without the Attribute Value
Independence Assumption,” Proc. 23rd Int’l Conf. Very Large Data Bases (VLDB ’97),
pp. 486-495, 1997.
[13] P. Rodr_ıguez-Gianolli, M. Garzetti, L. Jiang, A. Kementsietsidis, I. Kiringa, M.

Masud, R.J. Miller, and J. Mylopoulos, “Data Sharing in the Hyperion Peer Database
System,” Proc. Int’l Conf. Very Large Data Bases, pp. 1291-1294, 2005.
[14] Saepio Technologies Inc., “The Enterprise Marketing Management Strategy Guide,”
White Paper, 2010.
[15] I. Tatarinov, Z.G. Ives, J. Madhavan, A.Y. Halevy, D. Suciu, N.N. Dalvi, X. Dong,
Y. Kadiyska, G. Miklau, and P. Mork, “The Piazza Peer Data Management Project,”
SIGMOD Record, vol. 32, no. 3, pp. 47-52, 2003.
[16] A. Thusoo, J. Sarma, N. Jain, Z. Shao, P. Chakka, S. Anthony, H. Liu, P. Wyckoff,

and R. Murthy, “HIVE: A Warehousing Solution over a Map-Reduce Framework,” Proc.
VLDB Endowment, vol. 2, no. 2, pp. 1626-1629, 2009.
[16] S. Wu, S. Jiang, B.C. Ooi, and K.-L. Tan, “Distributed Online Aggregation,” Proc.
VLDB Endowment, vol. 2, no. 1, pp. 443-454, 2009


[17] S. Wu, J. Li, B.C. Ooi, and K.-L. Tan, “Just-in-Time Query Retrieval over Partially
Indexed Data on Structured P2P Overlays,” Proc. ACM SIGMOD Int’l Conf.
Management of Data (SIGMOD ’08), pp. 279-290, 2008.
[18] S. Wu, Q.H. Vu, J. Li, and K.-L. Tan, “Adaptive Multi-Join Query Processing in
PDBMS,” Proc. IEEE Int’l Conf. Data Eng. (ICDE ’09), pp. 1239-1242, 2009.

Finalreport

Uploaded by

Copyright:

Available Formats

Finalreport

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Finalreport

Uploaded by

Copyright:

Available Formats

Distributed

Data Sharing Platform for Private Enterprise

To address the aforementioned problems, we present a distributed data sharing platform

Department of ISE, BIT Bangalore 2

1.1 What is Data Mining?

Structure of Data Mining

1.2 How Data Mining Works?

Department of ISE, BIT Bangalore 3

• Associations: Data can be mined to identify associations. The beer-diaper

1.2.1 Data mining consists of five major elements:

1.2.2 Levels of analysis

• Artificial neural networks: Non-linear predictive models that learn through

Department of ISE, BIT Bangalore 4

1.2.3 Characteristics of Data Mining:

• Large quantities of data: The volume of data is so great it has to be analyzed by

Department of ISE, BIT Bangalore 5

2.1 A Comparative Analysis of Methodologies for Database

2.2 BATON: A Balanced Tree Structure for Peer-to-Peer

Department of ISE, BIT Bangalore 6

2.3 Data Sharing in the Hyperion Peer Database System

2.4 Adaptive Multi-Join Query Processing in PDBMS

Department of ISE, BIT Bangalore 8

3.1 PROBLEM DEFINITION

3.2 EXISTING SYSTEM

3.2.1 DISADVANTGES OF THE EXISTING SYSTEM

• Solution has not been designed to handle such dynamicity.

Department of ISE, BIT Bangalore 9

3.3 PROPOSED SYSTEM

3.3.1 ADVANTAGES OF PROPOSED SYSTEM

Department of ISE, BIT Bangalore 10

• Adaptive query processing is supported which helps dealing with the

• Therefore, the proposed system is a promising solution for efficient data

Department of ISE, BIT Bangalore 11

Java is an object-oriented multithread programming language. It is designed to be small,

4.1.1 Features of java

• Object oriented throughout - no coding outside of class definitions, including

Automatic Memory Management

• Automatic garbage collection - memory management handled by JVM.

• Interpretation of byte codes slowed performance in early versions, but advanced

• Lightweight processes, called threads, can easily be spun off to perform

• Can take advantage of multiprocessors where available

• Great for multimedia displays.

4.1.2 Connected Device Configuration

4.1.3 Personal Basis Profile

4.1.4 Personal Profile

JavaScript is a lightweight, interpreted programming language. It is designed for creating

4.2.1 Merits of using JavaScript

4.2.2 Limitations of JavaScript

• JavaScript cannot be used for networking applications because there is no such

• JavaScript doesn't have any multithreading or multiprocessor capabilities.

4.3 APACHE TOMCAT SERVER

Department of ISE, BIT Bangalore 15

4.3.1 Advantages of using Apache Tomcat to run our website’s Java

• It’s Incredibly Lightweight

4.4 JAVA SERVER PAGE

Department of ISE, BIT Bangalore 16

• Performance is significantly better because JSP allows embedding Dynamic

Department of ISE, BIT Bangalore 17

• It’s An Industry Standard Although MySQL’s popularity has waned somewhat in