Subject Index
Subject Index
Subject Index
7 41
Case Study: Amazon Web Services
This idea became very popular in the late 1960s, but by the mid-1970s the idea
faded away when it became clear that the IT-related technologies of the day were
unable to sustain such a futuristic computing model. However, since the turn of the
millennium, the concept has been revitalized. It was during this time of
revitalization that the term cloud computing began to emerge in technology circles.
Cloud computing is a model for enabling convenient, on-demand network access
to a shared pool of configurable computing resources (e.g., networks, servers,
storage, applications, and services) that can be rapidly provisioned and released
with minimal management effort or service provider interaction.
platforms (e.g., mobile phones, laptops, and PDAs) as well as other traditional
or cloud based software services.
13
Resource pooling:--The provider‘s computing resources are pooled to
serve multiple
consumers using a multi-tenant model, with different physical and virtual
resources
dynamically assigned and reassigned according to consumer demand.
2. Consumer is able to deploy and run arbitrary software, which can include
operating systems and applications.
cloud infrastructure.
The applications are accessible from various client devices through a thin
client interface such as a web browser (e.g., web-based email).
The consumer does not manage or control the underlying cloud infrastructure
including network, servers, operating systems, storage, or even individual
application capabilities, with the possible exception of limited user specific
application configuration settings.
12
Cloud Deployment Models:
Public
Private
Community Cloud
Hybrid Cloud
Public Cloud: The cloud infrastructure is made available to the general public
or a large industry group and is owned by an organization selling cloud
services.
Another type of VPN is commonly called a site-to-site VPN. Here the company
would invest in dedicated hardware to connect multiple sites to their LAN though a
public network, usually the Internet. Site-to-site VPNs are either intranet or
extranet-based
Intranet
A network based on TCP/IP protocols (an intranet) belonging to an organization,
usually a corporation, accessible only by the organization's members, employees or
others with authorization. Secure intranets are now the fastest-growing segment of
the Internet because they are much less expensive to build and manage than private
networks based on proprietary protocols.
Extranet
An extranet refers to an intranet that is partially accessible to authorized outsiders.
Whereas an intranet resides behind a firewall and is accessible only to people who
are members of the same company or organization, an extranet provides various
levels of accessibility to outsiders. You can access an extranet only if you have a
valid username and password, and your identity determines which parts of the
extranet you can view. Extranets are becoming a popular means for business
partners to exchange information.
Other options for using a VPN include such things as using dedicated private
leased lines. Due to the high cost of dedicated lines, however, VPNs have become
an attractive cost-effective solution.
Securing a VPN
If you're using a public line to connect to a private network, then you might
wonder what makes a virtual private network private? The answer is the manner in
which the VPN is designed. A VPN is designed to provides a secure,
encrypted tunnel in which to transmit the data between the remote user and the
company network. The information transmitted between the two locations via the
encrypted tunnel cannot be read by anyone else.
VPN security contains several elements to secure both the company's private
network and the outside network, usually the Internet, through which the remote
user connects through. The first step to security is usually a firewall. You will have
a firewall site between the client (which is the remote users workstation) and the
host server, which is the connection point to the private network. The remote user
will establish an authenticated connection with the firewall.
VPN Encryption
Encryption is also an important component of a secure VPN. Encryption works by
having all data sent from one computer encrypted in such a way that only the
computer it is sending to can decrypt the data. Types of encryption commonly used
include public-key encryption which is a system that uses two keys — a public key
known to everyone and a private or secret key known only to the recipient of the
message. The other commonly used encryption system is a Symmetric-key
encryption system in which the sender and receiver of a message share a single,
common key that is used to encrypt and decrypt the message.
VPN Tunnelling
With a VPN you'll need to establish a network connection that is based on the idea
of tunnelling. There are two main types of tunnelling used in virtual private
networks. Voluntary tunnelling is where the client makes a connection to the
service provider then the VPN client creates the tunnel to the VPN server once the
connection has been made. In compulsory tunnelling the service provider manages
the VPN connection and brokers the connection between that client and a VPN
server.
Network Protocols for VPN Tunnels
There are three main network protocols for use with VPN tunnels, which are
generally incompatible with each other. They include the following:
IPSec
A set of protocols developed by the IETF to support secure exchange of packets at
the IP layer. IPsec has been deployed widely to implement VPNs. IPsec supports
two encryption modes: Transport and Tunnel. Transport mode encrypts only the
data portion (payload) of each packet, but leaves the header untouched. The more
secure Tunnel mode encrypts both the header and the payload. On the receiving
side, an IPSec-compliant device decrypts each packet. For IPsec to work, the
sending and receiving devices must share a public key. This is accomplished
through a protocol known as Internet Security Association and Key Management
Protocol/Oakley (ISAKMP/Oakley), which allows the receiver to obtain a public
key and authenticate the sender using digital certificates.
PPTP
Short for Point-to-Point Tunnelling Protocol, a new technology for creating VPNs,
developed jointly by Microsoft, U.S. Robotics and several remote access vendor
companies, known collectively as the PPTP Forum. A VPN is a private network of
computers that uses the public Internet to connect some nodes. Because the
Internet is essentially an open network, PPTP is used to ensure that messages
transmitted from one VPN node to another are secure. With PPTP, users can dial in
to their corporate network via the Internet.
L2TP
Short for Layer Two (2) Tunnelling Protocol, an extension to the PPP protocol that
enables ISPs to operate Virtual Private Networks (VPNs). L2TP merges the best
features of two other tunnelling protocols: PPTP from Microsoft and L2F from
Cisco Systems. Like PPTP, L2TP requires that the ISP's routers support the
protocol.
VPN Equipment
Depending on the type of VPN you decide to implement, either remote-access or
site-to-site, you will need specific components to build your VPN. These standard
components include a software client for each remote workstation, dedicated
hardware, such as a firewall or a product like the Cisco VPN Concentrator, a VPN
server, and a Network Access Server (NAS).
Theory:
Google App Engine:
Architecture :
The Google App Engine (GAE) is Google`s answer to the ongoing trend of Cloud
Computing offerings within the industry. In the traditional sense, GAE is a web
application hosting service, allowing for development and deployment of web-
based applications within a pre-defined runtime environment. Unlike other cloud-
based hosting offerings such as Amazon Web Services that operate on an IaaS
level, the GAE already provides an application infrastructure on the PaaS level.
This means that the GAE abstracts from the underlying hardware and operating
system layers by providing the hosted application with a set of application-oriented
services. While this approach is very convenient fo
developers of such applications, the rationale behind the GAE is its focus on
scalability and usage-based infrastructure as well as payment.
Costs :
Developing and deploying applications for the GAE is generally free of charge but
restricted to a certain amount of traffic generated by the deployed application. Once
this limit is reached within a certain time period, the application stops working.
However, this limit can be waived when switching to a billable quota where the
developer can enter a maximum budget that can be spent on an application per day.
Depending on the traffic, once the free quota is reached the application will continue
to work until the maximum budget for this day is reached. Table 1 summarizes some
of the in our opinion most important quotas and corresponding amount per unit that
is charged when free resources are depleted and additional, billable quota is desired.
Features :
With a Runtime Environment, the Data store and the App Engine services, the GAE
can be divided into three parts.
Runtime Environment
The GAE runtime environment presents itself as the place where the actual
application is executed. However, the application is only invoked once an HTTP
request is processed to the GAE via a web browser or some other interface, meaning
that the application is not constantly running if no invocation or processing has been
done. In case of such an HTTP request, the request handler forwards the request and
the GAE selects one out of many possible Google servers where the application is
then instantly deployed and executed for a certain amount of time (8). The
application may then do some computing and return the result back to the GAE
request handler which forwards an HTTP response to the client. It is important to
understand that the application runs completely embedded in this described sandbox
environment but only as long as requests are still coming in or some processing is
done within the application. The reason for this is simple: Applications should only
run when they are actually computing, otherwise
they would allocate precious computing power and memory without need. This
paradigm shows already the GAE‘s potential in terms of scalability. Being able
to run multiple instances of one
application independently on different servers guarantees for a decent level of
scalability. However, this highly flexible and stateless application execution
paradigm has its limitations. Requests
are processed no longer than 30 seconds after which the response has to be returned
to the client and the application is removed from the runtime environment again (8).
Obviously this method
50
accepts that for deploying and starting an application each time a request is
processed, an additional lead time is needed until the application is finally up and
running. The GAE tries to encounter this problem by caching the application in the
server memory as long as possible, optimizing for several subsequent requests to the
same application. The type of runtime environment on the Google servers is
dependent on the programming language used. For Java or other languages that have
support for Java-based compilers (such as Ruby, Rhino and Groovy) a Java-based
Java Virtual Machine (JVM) is provided. Also, GAE fully supports the Google Web
Toolkit (GWT), a framework for rich web applications. For Python and related
frameworks a Python-based environment is used.
As previously discussed, the stateless execution of applications creates the need for
a datastore that provides a proper way for persistence. Traditionally, the most
popular way of persisting data in web applications has been the use of relational
databases. However, setting the focus on high flexibility and scalability, the GAE
uses a different approach for data persistence, called Big table
Services
The platform innate memory cache service serves as a short-term storage. As its
name suggests, it stores data in a server‘s memory allowing for faster access
compared to the datastore.
Me cache is a non-persistent data store that should only be used to store temporary
data within a series of computations. Probably the most common use case for Me
cache is to store session specific data (15). Persisting session information in the
datastore and executing queries on every page interaction is highly inefficient over
the application lifetime, since session-owner instances are unique per session (16).
Moreover, Me cache is well suited to speed up common datastore queries (8). To
interact with the Me cache GAE supports Cache, a proposed interface standard for
memory caches.
Experiment No. 5
What is Hadoop?
Organizations can deploy Hadoop components and supporting software packages
in their local data center. However, most big data projects depend on short-term
use of substantial computing resources. This type of usage is best-suited to highly
scalable public cloud services, such as Amazon Web Services
(AWS), Google Cloud Platform and Microsoft Azure. Public cloud providers often
support Hadoop components through basic services, such as AWS Elastic Compute
Cloud and Simple Storage Service instances. However, there are also services
tailored specifically for Hadoop-type tasks, such as AWS Elastic
MapReduce, Google Cloud Daturic and Microsoft Azure HDInsight.
Hadoop modules and projects
As a software framework, Hadoop is composed of numerous functional modules.
At a minimum, Hadoop uses Hadoop Common as a kernel to provide the
framework's essential libraries. Other components include Hadoop Distributed File
System (HDFS), which is capable of storing data across thousands of commodity
servers to achieve high bandwidth between nodes; Hadoop Yet Another Resource
Negotiator (YARN) which provides resource management and scheduling for user
applications; and Hadoop MapReduce which provides the programming model
used to tackle large distributed data processing -- mapping data and reducing it to a
result.
Hadoop also supports a range of related projects that can complement and extend
Hadoop's basic capabilities. Complementary software packages include:
Apache Flume. A tool used to collect, aggregate and move huge amounts of
streaming data into HDFS.
Apache HBase An open source, nonrelational, distributed database;
Apache Hive A data warehouse that provides data summarization, query and
analysis;
Apache Sqoop. A tool to transfer bulk data between Hadoop and structured data
stores, such as relational databases;
Apache Spark- A fast engine for big data processing capable of streaming and
supporting SQL, machine learning and graph processing;
Theory-
Why Amazon Web Services?
The Benefits
benefits, including:
and more dynamic each year, both organically and as a result of acquisitions.
AWS has enabled Amazon.com to keep pace with this rapid expansion, and to do
so seamlessly. Historically, Amazon.com business groups have had to write
annual backup plans, quantifying the amount of tape storage that they plan to use
for the year and the frequency with which they will use the tape resources. These
plans are then used to charge each organization for their tape usage, spreading
the cost among many teams. With Amazon S3, teams simply pay for what they
use, and are billed for their usage as they go. There are virtually no upper limits
as to how much data can be stored in Amazon S3, and so there are no worries
about running out of resources. For teams adopting Amazon S3 backups, the need
for formal planning has been all but eliminated.
68
Backing up a database to Amazon S3 can be two to twelve times faster than with
tape drives. As one example, in a benchmark test a DBA was able to restore 3.8
terabytes in 2.5 hours over gigabit Ethernet. This amounts to 25 gigabytes per
minute, or 422MB per second. In addition, since Amazon.com uses RMAN data
compression, the effective restore rate was 3.37 gigabytes per second. This 2.5
hours compares to, conservatively, 10-15 hours that
would be required to restore from tape.
Easy implementation of Oracle RMAN backups to Amazon S3. The DBAs found
it easy to start backing up their databases to Amazon S3. Directing Oracle RMAN
backups to Amazon S3 requires only a configuration of the Oracle Secure Backup
Cloud (SBC) module. The effort required to configure the Oracle SBC module
amounted to an hour or less per database. After this one-time setup, the database
backups were transparently redirected to Amazon S3.
Durable data storage provided by Amazon S3, which is designed for 11 nines
durability. On occasion, Amazon.com has experienced hardware failures with tape
infrastructure – tapes that break, tape drives that fail, and robotic components that
fail. Sometimes this happens when a DBA is trying to restore a database, and
dramatically increases the mean time to recover (MTTR). With the durability and
availability of Amazon S3, these issues are no
longer a concern.
Elimination of physical tape transport to off-site location. Any company that has
been storing Oracle backup data offsite should take a hard look at the costs involved
in transporting, securing and storing
their tapes offsite – these costs can be reduced or possibly eliminated by storing
the data in Amazon S3.
As the world‘s largest online retailer, Amazon.com continuously innovates in
order to provide
improved customer experience and offer products at the lowest possible prices. One
such innovation has been to replace tape with Amazon S3 storage for database
backups. This innovation is one that can be easily replicated by other organizations
that back up their Oracle databases to tape
mazon Relational Database Service (RDS)
Amazon Relational Database Service is a web service that makes it easy to set up,
operate, and scale a relational database in the cloud.
›Amazon Elastic ache
Amazon Elastic ache is a web service that makes it easy to deploy, operate, and
scale an in-memory cache in the cloud.
AWS Elastic Beanstalk is an even easier way to quickly deploy and manage
applications in the AWS cloud. We simply upload our application, and Elastic
Beanstalk automatically handles the deployment details of capacity provisioning,
load balancing, auto-scaling, and application health monitoring.
AWS Cloud Formation is a service that gives developers and businesses an easy
way to create a collection of related AWS resources and provision them in an
orderly and predictable fashion.
Information Service makes Alexa‘s huge repository of data about structure and
Alexa Web traffic patterns on the Web available
developers.
Azure or AWS?
With the most data center regions around the globe, unmatched consistent
hybrid cloud capabilities and comprehensive AI services—Azure is the right
choice for your business. See why organisations worldwide are choosing Azure.
One of the key features of Aneka is the ability of providing different ways for
expressing distributed applications by offering different programming models;
execution services are mostly concerned with providing the middleware with an
implementation for these models. Additional services such as persistence and
security are transversal to the entire stack of services that are hosted by the
Container. At the application level, a set of different components and tools are
provided to: 1) simplify the development of applications (SDK); 2) porting
existing applications to the Cloud; and 3) monitoring and managing the Aneka
Cloud.