Cloud Computing Notes
Cloud Computing Notes
Mr.M.VENGATESHWARAN
Department : CSE
Publication :
Awards - 06
Book Published – 08
International Journal - 31
International Conference -42
National Conference -32
History:
Before cloud computing emerged, there was client/server computing, centralized storage in which all
the data, software applications and all the controls reside on the server side.
If a user wants to run a program or access a specific data, then he connects to the server and gain
appropriate access and can do his business. Distributed computing concept came after this, where all
the computers are networked together and resources are shared when needed.
The Cloud Computing concept came into the picture in the year 1950 with accessible via thin/static
clients and the implementation of mainframe computers. Then in 1961, John McCarthy delivered a
speech at MIT in which he suggested that computing can be sold like a utility like electricity and food.
The idea was great but it was much ahead of its time and despite having an interest in the model, the
technology at that time was not ready for it.
In 1999, Salesforce.com became the 1st company to enter the cloud arena, excelling the concept of
providing enterprise-level applications to end users through the Internet.
Then in 2002, Amazon came up with Amazon Web Services, providing services like computation,
storage, and even human intelligence.
In 2009, Google Apps and Microsoft’s Windows Azure also started to provide cloud computing
enterprise applications. Other companies like HP and Oracle also joined the stream of cloud
computing, for fulfilling the need for greater data storage.
1. Public cloud:
2. Private Cloud
It is used by organization internally & it’s for single organization, anyone within the organization can
get access to data, services, and web application easily through local server and local network but users
outside the organizations cannot access them.
This model run on intranet
Completely managed by organization people.
Adv:
Speed of access is high
More secure
Does not require internet connection
Dis-Adv:
Implementation cost is high
It require administrator
Scalability is very limited
6 Mr.M.Vengateshwaran M.E., Asst.Prof/CSE SKCET Cloud Computing
3. Hybrid Cloud
In a Hybrid cloud, there is an ease to move the application to move from one cloud to another. Hybrid
Cloud is a combination of Public and Private Cloud which supports the requirement to handle data in
an organization.
It run on both online & offline
In hybrid cloud lack of flexibility, security and certainty of in-house applications.
4. Community Cloud
The companies having similar interest and work can share the same cloud and it can be done with the
help of Community Cloud. The initial investment is saved, as the setup is established.
Managed by third party.
Netflix
Pinterest
Xerox
Instagram
Apple
Google
Facebook
1. SAAS
SaaS stands for Software as a Service, provides a facility to the user to use the software from anywhere
with the help of an internet connection. It is also known as software on demand.
The remote access is possible because of service providers, host applications and their associated data
at their location.
There are various benefits of the SaaS as it is economical and only the user has to pay for some of the
basic costs such as licensing fees, installation costs, maintenance fees, and support fees.
Some of the examples of SaaS are Yahoo! Mail, Hotmail, and Gmail.
3. IAAS
IaaS stands for Infrastructure as a Service.
With the help of IAAS, the user can use IT hardware and software just by paying the basic price of it.
The companies that use IaaS are IBM, Google, and Amazon. With the help of visualization, the host
can manage and create the infrastructure resources at the cloud.
For small start-ups and firms, the IaaS has the major advantage as it benefits them with the
infrastructure rather than spending a large amount of money on hardware and infrastructure.
The reason for choosing IaaS is that it is easier, faster, and cost-efficient which reduces the burden of
the organizations.
Features of Cloud :
Disadvantages
Applications:
Cloud computing is all about renting computing services. This idea first came in the 1950s.
In making cloud computing what it is today, five technologies played a vital role. These are
distributed systems and its peripherals, virtualization, web 2.0, service orientation, and
utility computing.
Distributed System
A parallel system contains more than one processor having direct memory access to the shared memory that
can form a common address space. Usually, a parallel system is of a Uniform Memory Access (UMA)
architecture. In UMA architecture, the access latency (processing time) for accessing any particular location
of a memory from a particular processor is the same. Moreover, the processors are also configured to be in a
close proximity and are connected in an interconnection network. Conventionally, the interprocess processor
communication between the processors is happening through either read or write operations across a shared
memory, even though the usage of the message-passing capability is also possible (with emulation on the
shared memory). Moreover, the hardware and software are tightly coupled, and usually, the processors in
such network are installed to run on the same operating system. In general, the processors are homogeneous
and are installed within the same container of the shared memory. A multistage switch/bus containing a
regular and symmetric design is used for greater efficiency.
The following diagram represents a UMA parallel system with multiple processors connecting to multiple
memory units through network connection.
A multicomputer parallel system is another type of parallel system containing multiple processors configured
Array processor exchanges information by passing as messages. Array processors have a very small market
owing to the fact that they can perform closely synchronized data processing, and the data is exchanged in a
locked event for applications such as digital signal processing and image processing. Such applications can
also involve large iterations on the data as well.
Compared to the UMA and array processors architecture, NUMA as well as message-passing multicomputer
systems are less preferred if the shared data access and communication much accepted. The primary benefit
of having parallel systems is to derive a better throughput through sharing the computational tasks between
multiple processors. The tasks that can be partitioned into multiple subtasks easily and need little
DISTRIBUTED COMPUTING
Distributed computing is the concurrent usage of more than one connected computer to solve a problem
over a network connection. The computers that take part in distributed computing appear as single machines
to their users.
Distributing computation across multiple computers is a great approach when these computers are observed
to interact with each other over the distributed network to solve a bigger problem in reasonably less latency.
In many respects, this sounds like a generalization of the concepts of parallel computing that we discussed in
the previous section. The purpose of enabling distributed systems includes the ability to confront a problem
that is either bigger or longer to process by an individual computer.
Distributed computing, the latest trend, is performed on a distributed system, which is considered to be a
group of computers that do not stake a common physical clock or a shared memory, interact with the
information exchanged over a communication (inter/intra) network, with each computer having its own
memory, and runs on its own operating system. Usually, the computers are semi-autonomous, loosely coupled
and cooperate to address a problem collectively.
Given the number of frames to condense for a full-length feature (30 frames per second on a 2-hour movie,
which is a lot!), movie studios have the requirement of spreading the full-rendering job to more computers.
The other aspects of the application requiring a distributed system configuration are instant messaging and
video conferencing applications. Having the ability to solve such problems, along with improved
performance, is the reason for choosing distributed systems.
The devices that can take part in distributed computing include server machines, work stations, and personal
handheld devices.
Capabilities of distributed computing include integrating heterogeneous applications that are developed and
run on different technologies and operating systems, multiple applications sharing common resources, a
single instance service being reused by multiple clients, and having a common user interface for multiple
applications.
Some examples of parallel computing include weather forecasting, movie special effects, and
desktop computerapplications.
In parallel computing systems, as the number of processors increases, with enough parallelism
available in applications, such systems easily beat sequential systems in performance through
the shared memory. In such systems, the processors can also contain their own locally
allocated memory, which is not available to any other processors.
In distributed computing systems, multiple system processors can communicate with each
other using messages that are sent over the network. Such systems are increasingly available
these days because of the availability at low price of computer processors and the high-
bandwidth links to connect them.
The following reasons explain why a system should be built distributed, not just parallel:
Scalability: As distributed systems do not have the problems associated with shared memory,
with the increased number of processors, they are obviously regarded as more scalable than
parallel systems.
Reliability: The impact of the failure of any single subsystem or a computer on the network of
computers defines the reliability of such a connected system. Definitely, distributed systems
demonstrate a better aspect in this area compared to the parallel systems.
Data sharing: Data sharing provided by distributed systems is similar to the data sharing
provided by distributed databases. Thus, multiple organizations can have distributed systems
with the integrated applications for data exchange.
Resources sharing: If there exists an expensive and a special purpose resource or a processor,
17 Mr.M.Vengateshwaran M.E., Asst.Prof/CSE SKCET Cloud Computing
which cannot be dedicated to each processor in the system, such a resource can be easily
shared across distributed systems.
Economic: With the evolution of modern computers, high-bandwidth networks & low cost
1. On-demand self-services:
The Cloud computing services does not require any human administrators, user themselves are
able to provision, monitor and manage computing resources as needed.
The Computing services are generally provided over standard networks and
heterogeneous devices.
3. Rapid elasticity:
The Computing services should have IT resources that are able to scale out and in quickly and
on as needed basis. Whenever the user require services it is provided to him and it is scale out
as soon as its requirement gets over.
The IT resource (e.g., networks, servers, storage, applications, and services) present are shared
across multiple applications and occupant in an uncommitted manner. Multiple clients are
provided service from a same physical resource.
5. Measured service:
The resource utilization is tracked for each application and occupant, it will provide both the
user and the resource provider with an account of what has been used. This is done for various
reasons like monitoring billing and effective use of resource.
ELASTICITY IN CLOUD
Elasticity is the ability to grow or shrink infrastructure resources dynamically as
needed to adapt to workload changes in an autonomic manner, maximizing the use of
resources. This can result in savings in infrastructure costs overall. Not everyone can
benefit from elastic services though. Environments that do not experience sudden or
cyclical changes in demand may not benefit from the cost savings elastic services offer.
Use of “Elastic Services” generally implies all resources in the infrastructure be elastic.
This includes but not limited to hardware, software, QoS and other policies,
connectivity, and other resources that are used in elastic applications. This may become
a negative trait where performance of certain applications must have guaranteed
performance. It depends on the environment.
A use case that could easily have the need for cloud elasticity would be in retail
with increased seasonal activity. For example, during the holiday season for black
Friday spikes and special sales during this season there can be a sudden increased
demand on the system. Instead of spending budget on additional permanent
infrastructure capacity to handle a couple months of high load out of the year, this is a
good opportunity to use an elastic solution. The additional infrastructure to handle the
increased volume is only used in a pay-as-you-grow model and then “shrinks” back to a
lower capacity for the rest of the year. This also allows for additional sudden and
unanticipated sales activities throughout the year if needed without impacting
performance or availability. This can give IT managers the security of unlimited
headroom when needed. This can also be a big cost savings to retail companies looking
to optimize their IT spend if packaged well by the service provider.
On-demand Provisioning
On-demand computing is a delivery model in which computing resources are
made available to the user as needed. ... When the services are provided by a third-
party, the term cloud computing is often used as a synonym for on- demand
computing.
This layer consists of two distinct but related areas: resource abstraction and control layer.
The Resource Abstraction Layer primarily deals with virtualization. The Virtualization Essentials course
defines the concept of virtualization as "a set of techniques for hiding hardware resources behind software
abstractions to simplify the way other software or end users interact with those resources."
The manipulation of the software-abstracted resources enables greater functionality and easier
configuration.
This is what enables the cloud elasticity and automation.
The hypervisor and storage area networks (SAN) are two examples of this concept.
The Physical Resource Layer covers all of the traditional hardware resources that underpin the IT
infrastructure.
This layer consists of physical servers (CPU, memory, bus architecture), disks and storage arrays, network
wiring,switches, and routers.
This layer also covers the physical data center facility components such as heating,ventilation, air
conditioning (HVAC), electrical power, backup generators, and fuel; physical control of datacenters by IT
staff and contractors; and cabling to outside cloud carriers, phone communication, etc.
Cloud Service Management is a set of processes and activities a cloud provider must perform in order to
satisfactorily deliver cloud service to consumers.
These apply equally to a public cloud provider and a private cloud provider.
NIST groups these processes and activities into three board areas: Business Support, Provisioning and
Configuration, and Portability and Interoperability.
o The Business Support processes are business-oriented and focus on the business operations of a
cloud provider as they relate to the delivery of cloud services to cloud consumers. There are six
key functions.
Customer Management:
This area covers the activities necessary to manage and maintain the relationship with the cloud
consumer. It deals with items such as customer accounts, complaints and issues, customer contact
information, history ofcustomr interactions, etc.
Contract Management:
This process focuses on the management of contracts between the cloud provider and consumer. This is
implemented via Service Level Agreements (SLAs). Consumers generally pick the level of SLA that
meets their requirements and budget.
Inventory Management:
This process manages the definitive set of cloud services offered to cloud consumers. It establishes a
service catalog and is the primary interface for the consumer to engage with the cloud provider.
Rapid Provisioning:
A cloud provider must be able to quickly respond to varying workload demands. This includes scaling
up as well as scaling down. This must be fully automated and requires a scriptable, virtualized
infrastructure.
Resource Changing:
To support rapid elasticity, the provider must implement changes to its underlying resources effectively
and speedily, primarily through automation. These changes include replacing broken components,
upgrading components, adding greater capacity, and reconfiguring existing components.
SLA Management:
The cloud provider must ensure that it is meeting its contractual obligations to its customers. Ongoing
management of SLA targets and operational level targets are performed to maintain a high quality of
service.
Data Portability:
A cloud provider must provide a mechanism to move large amounts of data into and out of the
provider's cloud environment. For example, in a SaaS environment, the cloud consumer must be able to
upload, in bulk, existing HR records into a HR SaaS application.
The consumer must also be able to export in bulk from the HR SaaS application back to their own data
center. Failure to provide easy and reliable transfer mechanisms will discourage the adoption of cloud
services.
Service Interoperability:
When a cloud provider adheres to well-known and accepted technology standards, it is easier for
consumers to develop and deploy cloud solutions that span across more than one cloud provider's
environment.
For a cloud consumer, service interoperability delivers greater disaster recovery resiliency by removing
a single point of failure (i.e. the cloud provider) and greater resource capacity by spreading the
workload across several providers' IaaS resources.
System Portability:
This capability enables a consumer to move or migrate infrastructure resources, like virtual machines
and applications, easily from one cloud provider to another.
As in data portability, this enables a smoother exit strategy that protects a consumer from an
unexpected, long-term disruption of a cloud provider's services.
The traditional confidentiality-integrity-availability (CIA) areas of security still need to be addressed in each of
thethree service layers (IaaS, PaaS, SaaS). For example, an IaaS provider needs to ensure that the hypervisor is
secure and well-configured.
Incident response: A well-structured security process to deal with breaches with strong communication
channels is necessary to minimize the impact of any security incident.
A cloud provider must ensure that consumer data stored in the cloud environment is protected and private to
the consumer. If the cloud provider collects data about the consumer, or the consumer's activities and behavior
patterns, then they must ensure that the collected data is fully protected and remains private, and cannot be
accessed by anyone other than the consumer.
A cloud provider must explicitly guarantee that a consumer's data remains in a well-defined geographical
location with explicit acknowledgement of the consumer.
A cloud broker is an optional cloud player in the delivery of cloud services. NIST defines a cloud broker as an
entity that acts as an intermediary between the consumer and provider.
A cloud broker is involved in a cloud service delivery when a consumer chooses not to directly manage or
operate the usage of a cloud service.
A cloud broker can function in one or more of the following scenarios.
Service Intermediation
Service Intermediation is when a broker performs value-add service on behalf of the consumer. For example,
in following figure, the cloud broker performs some administrative or management function on behalf of the
consumer for a particular cloud service.
This value-add service may include activities such as invoice management, invoice and usage reconciliation,
and end-user account management, etc.
Service Aggregation
Service Aggregation is when a broker integrates two or more cloud services to provide a complex cloud
solution to the consumer. Following figure illustrates a cloud service that is composed of three different cloud
provider's services.
Service Arbitrage
Service Arbitrage is when a broker dynamically selects the best cloud service provider in real time. Following
figure illustrates a broker checking for the best cloud service, for example online storage, from three cloud
providers.
PUBLIC CLOUDS
A public cloud is built over the Internet and can be accessed by any user who has paid for the service. Public
clouds are owned by service providers and are accessible through a subscription. The callout box in top of
Figure 4.1 shows the architecture of a typical public cloud. Many public clouds are available, including Google
App Engine (GAE), Amazon Web Services (AWS), Microsoft Azure, IBM Blue Cloud, and Salesforce.com’s
Force.com. The providers of the aforementioned clouds are commercial providers that offer a publicly
accessible remote interface for creating and managing VM instances within their proprietary infrastructure. A
public cloud delivers a selected set of business processes. The application and infrastructure services are
offered on a flexible price-per-use basis.
PRIVATE CLOUDS
A private cloud is built within the domain of an intranet owned by a single organization. Therefore, it is client
owned and managed, and its access is limited to the owning clients and their partners. Its deployment was not
meant to sell capacity over the Internet through publicly accessible interfaces. Private clouds give local users a
flexible and agile private infrastructure to run service workloads within their administrative domains. A private
cloud is supposed to deliver more efficient and convenient cloud services. It may impact the cloud
standardization, while retaining greater customization and organizational control.
HYBRID CLOUDS
A hybrid cloud is built with both public and private clouds, as shown at the lower-left corner of Figure 4.1.
Private clouds can also support a hybrid cloud model by supplementing local infrastructure with computing
capacity from an external public cloud. For example, the Research Compute Cloud (RC2) is a private cloud,
built by IBM, that interconnects the computing and IT resources at eight IBM Research Centers scattered
throughout the United States, Europe, and Asia. A hybrid cloud provides access to clients, the partner network,
and third parties. In summary, public clouds promote standardization, preserve capital investment, and offer
application flexibility. Private clouds attempt to achieve customization and offer higher efficiency, resiliency,
security, and privacy. Hybrid clouds operate in the middle, with many compromises in terms of resource
sharing.
To be able to develop, deploy, and manage the execution of applications using provisioned resources demands
a cloud platform with the proper software environment. Such a platform includes operating system and
runtime library support. This has triggered the creation of the PaaS model to enable users to develop and
deploy their user applications. Table 4.2 highlights cloud platform services offered by five PaaS services. The
platform cloud is an integrated computer system consisting of both hardware and software infrastructure. The
user application can be developed on this virtualized cloud platform using some programming languages and
software tools supported by the provider (e.g., Java, Python, .NET). The user does not manage the underlying
cloud infrastructure. The cloud provider supports user application development and testing on a well-defined
service platform. This PaaS model enables a collaborated software development platform for users from
different parts of the world. This model also encourages third parties to provide software management,
integration, and service monitoring solutions.
• On-demand self-service: Users are able to provision cloud computing resources without requiring human
interaction, mostly done though a web-based self-service portal (management console).
• Broad network access: Cloud computing resources are accessible over the network, supporting
heterogeneous client platforms such as mobile devices and workstations
.
• Resource pooling: Service multiple customers from the same physical resources, by securely separating the
resources on logical level.
• Rapid elasticity: Resources are provisioned and released on-demand and/or automated based on triggers or
parameters. This will make sure your application will have exactly the capacity it needs at any point of time.
• Measured service: Resource usage are monitored, measured, and reported (billed) transparently based on
utilization. In short, pay for use.
The following image shows that cloud computing is composed of five essential characteristics, three
deployment models, and four service models as shown in the following figure:
Amazon Elastic Compute Cloud (EC2) is a key web service that provides a facility to create and manage
virtual machine instances with operating systems running inside them. There are three ways to pay for EC2
virtual machine instances, and businesses may choose the one that best fits their requirements. An on-demand
instance provides a virtual machine (VM) whenever you need it, and terminates it when you do not. A reserved
instance allows the user to purchase a VM and prepay for a certain period of time. A spot instance can be
purchased through bidding, and can be used only as long as the bidding price is higher than others. Another
convenient feature of Amazon’s cloud is that it allows for hosting services across multiple geographical
locations, helping to reduce network latency for a geographically-distributed customer base.
Amazon Relational Database Service (RDS) provides MySQL and Oracle database services in the cloud.
Amazon S3 is a redundant and fast cloud storage service that provides public access to files over http. Amazon
SimpleDB is very fast, unstructured NoSQL database.
Amazon Simple Queuing Service (SQS) provides a reliable queuing mechanism with which application
developers can queue different tasks for background processing.
Multiple VMs can share CPUs and main memory in cloud computing, but I/O sharing is problematic.
Internet applications continue to become more data-intensive. If we assume applications to be “pulled
apart” across the boundaries of clouds, this may complicate data placement and transport.
Cloud users and providers have to think about the implications of placement and traffic at every level
of the system, if they want to minimize costs.
This kind of reasoning can be seen in Amazon’s development of its new CloudFront service. Therefore,
data transfer bottlenecks must be removed, bottleneck links must be widened, and weak servers should
be removed..
The pay-as-you-go model applies to storage and network bandwidth; both are counted in terms of the
number of bytes used.
Computation is different depending on virtualization level. GAE automatically scales in response to
load increases and decreases; users are charged by the cycles used.
AWS charges by the hour for the number of VM instances used, even if the machine is idle. The
opportunity here is to scale quickly up and down in response to load variation, in order to save money,
but without violating SLAs.
Many cloud computing providers originally relied on open source software because the licensing model
for commercial software is not ideal for utility computing.
The primary opportunity is either for open source to remain popular or simply for commercial software
companies to change their licensing structure to better fit cloud computing.
One can consider using both pay-for-use and bulk-use licensing schemes to widen the business
coverage.
One customer’s bad behavior can affect the reputation of the entire cloud.
Another legal issue concerns the transfer of legal liability. Cloud providers want legal liability to
remain with the customer, and vice versa.
This problem must be solved at the SLA level. We will study reputation systems for protecting data
centers in the next section.
BASICS OF VIRTUALIZATION
Creation of virtual version of hardware platform, OS, N/W, storage etc.,
It allows to run multiple OS on a single physical machine called host machine.
Each instance of OS called Virtual machine (VM)
The machine on which the virtual machine is created is known as host machine and virtual machine is
referred as a guest machine. This virtual machine is managed by a software or firmware, which is
known as hypervisor.
Purpose of virtualization:
Virtualization is a technique, which allows to share single physical instance of an application or resource
among multiple organizations or tenants (customers). It does so by assigning a logical name to a physical
resource and providing a pointer to that physical resource on demand.
Virtualization Concept
Creating a virtual machine over existing operating system and hardware is referred as Hardware
Virtualization. Virtual Machines provide an environment that is logically separated from the underlying
hardware.
The machine on which the virtual machine is created is known as host machine and virtual machine is
referred as a guest machine. This virtual machine is managed by a software or firmware, which is
known as hypervisor.
Hypervisor
The hypervisor is a firmware or low-level program that acts as a Virtual Machine Manager. There are
two types of hypervisor:
Type 1 hypervisor executes on bare system. LynxSecure, RTS Hypervisor, Oracle VM, Sun xVM
Server, VirtualLogic VLX are examples of Type 1 hypervisor. The following diagram shows the Type
1 hypervisor.
Benefits of Virtualization
Virtualization can increase IT agility, flexibility, and scalability while creating significant cost
savings. Workloads get deployed faster, performance and availability increases and operations become
automated, resulting in IT that's simpler to manage and less costly to own and operate.
Reduce capital and operating costs.
Deliver high application availability.
Minimize or eliminate downtime.
Increase IT productivity, efficiency, agility and responsiveness.
Speed and simplify application and resource provisioning.
Support business continuity and disaster recovery.
Enable centralized management.
Build a true Software-Defined Data Center.
Emulation Virtualization
In Emulation, the virtual machine simulates the hardware and hence becomes independent of it. In
this, the guest operating system does not require modification.
Para virtualization
In Para virtualization, the hardware is not simulated. The guest software run their own isolated
domains.
Network virtualization. VLANs – virtual networks – have been around for a long time. A VLAN is a
group of systems that communicate in the same broadcast domain, regardless of the physical location
of each node. By creating and configuring VLANs on physical networking hardware, a network
administrator can place two hosts – one n New York City and one in Shanghai – on what appears to
these hosts to be the same physical network. The hosts will communicate with one another under this
scenario. This abstraction had made it easy for companies to move away from simply using physical
connections to define networks and be able to create less expensive networks that are flexible and
meet ongoing business needs.
Desktop virtualization. Desktop and server virtualization are two sides of the same coin. Both
involve virtualization of entire systems, but there are some key differences. Server virtualization
involves abstracting server-based workloads from the underlying hardware, which are then delivered
to clients as normal. Clients don’t see any difference between a physical and virtual server. Desktop
virtualization, on the other hand, virtualizes the traditional desktop and moves the execution of that
client workload to the data center. Those workloads are then accessed via a number of different
methods, such as thin clients or other means.
Virtualization is a computer architecture technology by which multiple virtual machines (VMs) are multiplexed
in the same hardware machine.
The purpose of a VM is to enhance resource sharing by many users and improve computer performance in
terms of resource utilization and application flexibility.
Hardware resources (CPU, memory,I/O devices, etc.) or software resources (operating system and software
libraries) can be virtualized in various functional layers.
Levels of Virtualization Implementation
A traditional computer runs with a host operating system specially tailored for its hardware architecture, as
shown in following (a).
At the ISA level, virtualization is performed by emulating a given ISA by the ISA of the host machine.
For example, MIPS binary code can run on an x86-based host machine with the help of ISA emulation.
Instruction set emulation leads to virtual ISAs created on any hardware machine.
The basic emulation method is through code interpretation
Instruction set emulation requires binary translation and optimization.
A virtual instruction set architecture (V-ISA) thus requires adding a processor-specific software
translation layer to the compiler.
V-ISA requires adding a processor-specific software translation layer in the complier.
Hardware-level virtualization inserts a layer between real hardware and traditional operating systems.
This layer is commonly called the Virtual Machine Monitor (VMM) and it manages the hardware
resources of a computing system.
42 Mr.M.Vengateshwaran M.E., Asst.Prof/CSE SKCET Cloud Computing
The VMM acts as a traditional OS.
One hardware component, such as the CPU, can be virtualized as several virtual copies.
Therefore, several traditional operating systems which are the same or different can sit on the same set
of hardware simultaneously.
Three requirements for a VMM
1. Environment for programs which is essentially identical to the original machine.
2. Programs run in this environment should experience only minor decreases in speed.
3. VMM should be in complete control of the system resources
Complete control of these resources by a VMM includes the following aspects:
1. The VMM is responsible for allocating hardware resources for programs
2. It is not possible for a program to access any resource not explicitly allocated to it.
3. It is possible under certain circumstances for a VMM to regain control of resources already allocated.
Disadvantages of OS Extensions
1. All the VMs at operating system level on a single container must have the same kind of guest
operating system.
2. To implement OS-level virtualization, isolated execution environments (VMs) should be created based
on a single OS kernel.
3. The access requests from a VM need to be redirected to the VM’s local resource partition on the
physical machine.
Hypervisor (VMM)
A hardware virtualization techniques allowing multiple guest OSs to run on a host machine.
Provides hyper calls for the guest OSs and applications
Depending on the functionalities, a hypervisor can Assume a micro-kernel architecture Or assume a
monolithic hypervisor architecture like VMware ESX for server virtualization Types of Hypervisor
Type 1 Hypervisor Run on the bare metal hypervisor
The core components of a Xen system are the hypervisor, kernel, and applications. The organization of
the three components is important.
Like other virtualization systems, many guest OSes can run on top of the hypervisor. However, not all
guest OSes are created equal, and one in particular controls the others.
The guest OS, which has control ability, is called Domain 0, and the others are called Domain U.
Domain 0 is a privileged guest OS of Xen.
It is first loaded when Xen boots without any file system drivers being available. Domain 0 is
designed to access hardware directly and manage devices.
Therefore, one of the responsibilities of Domain 0 is to allocate and map hardware resources for the
guest domains (the Domain U domains).
CPU VIRTUALIZATION
A VM is a duplicate of an existing computer system in which a majority of the VM instructions are
executed on the host processor in native mode. Thus, unprivileged instructions of VMs run directly on
the host machine for higher efficiency.
Other critical instructions should be handled carefully for correctness and stability.
The critical instructions are divided into three categories: privileged instructions, control sensitive
instructions, and behavior-sensitive instructions.
CPU architecture is virtualizable if it supports the ability to run the VM’s privileged and unprivileged
instructions in the CPU’s user mode while the VMM runs in supervisor mode.
When the privileged instructions including control- and behavior-sensitive instructions of a VM are
executed, they are trapped in the VMM. In this case, the VMM acts as a unified mediator for hardware
access from different VMs to guarantee the correctness and stability of the whole system.
RISC CPU architectures can be naturally virtualized because all control- and behavior-sensitive
instructions are privileged instructions.
On the contrary,x86 CPU architectures are not primarily designed to support virtualization. This is
because about 10 sensitive instructions, such as SGDT and SMSW, are not privileged instructions.
These instructions cannot be trapped in the VMM.
CPU virtualization is related to range protection levels called rings in which code can execute.
Intel X86 architecture of CPU offers four levels of privileges known as Ring 0,1,2,3
Ring 0, Ring 1, Ring 2 are associated with OS
Ring 3 is reserved for applications to manage access to the computer hardware.
Ring 0 is used by kernel because of that ring 0 has the highest level privilege
Ring 3 has lowest privilege as it belongs to user level application
The user level applications typically run in ring3, the OS needs to have direct access to the memory
and hardware and must execute its privileged instructions in Ring0.
This approach was implemented by VMware and many other software companies. As shown in
following figure VMware puts the VMM at Ring 0 and the guest OS at Ring 1.
The VMM scans the instruction stream and identifies the privileged, control- and behavior-sensitive
instructions.
The method used in this emulation is called binary translation. Therefore, full vir-tualization combines
binary translation and direct execution.
The guest OS is completely decoupled from the underlying hardware. Consequently, the guest OS is
unaware that it is being virtualized.
Host-Based Virtualization
An alternative VM architecture is to install a virtualization layer on top of the host OS. This host OS is
still responsible for managing the hardware. The guest OSes are installed and run on top of the
virtualization layer.
This host based architecture has some distinct advantages, as enumerated next. First, the user can
install this VM architecture without modifying the host OS.
The virtualizing software can rely on the host OS to provide device drivers and other low-level
services.
This will simplify the VM design and ease its deployment.
Second, the host-based approach appeals to many host machine configurations.
Compared to the hypervisor/VMM architecture, the performance of the host-based architecture may
also be low.
2. Para-Virtualization with Compiler Support
The traditional x86 processor offers four instruction execution rings: Rings 0, 1, 2, and 3.
The lower the ring number, the higher the privilege of instruction being executed.
The OS is responsible for managing the hardware and the privileged instructions to execute at Ring 0,
while user-level applications run at Ring 3.
The best example of para-virtualization is the KVM to be described below.
Para-Virtualization Architecture
When the x86 processor is virtualized, a virtualization layer is inserted between the hardware and the
OS.
According to the x 86 ring definitions, the virtualization layer should also be installed at Ring 0.
Different instructions at Ring 0 may cause some problems.
Para-virtualization replaces non virtualizable instructions with hypercalls that communicate directly
with the hypervisor or VMM.
However, when the guest OS kernel is modified for virtualization, it can no longer run on the hardware
directly.
The problems are compatibility, portability, the cost of maintaining para-virtualized OSes is high, the
performance advantage of para-virtualization varies greatly due to workload variations.
The KVM does the rest, which makes it simpler than the hypervisor that controls the entire machine.
KVM is a hardware-assisted para-virtualization tool, which improves performance and supports
unmodified guest OSes such as Windows, Linux, Solaris, and other UNIX variants.
2. Memory Virtualization
Virtual memory virtualization is similar to the virtual memory support provided by modern operating
Systems.
All modern x86 CPUs include a memory management unit (MMU) and a translation look aside buffer
(TLB) to optimize virtual memory performance. However, in a virtual execution environment, virtual
memory virtualization involves sharing the physical system memory in RAM and dynamically
allocating it to the physical memory of the VMs.
That means a two-stage mapping process should be maintained by the guest OS and the VMM,
respectively: virtual memory to physical memory and physical memory to machine memory.
Furthermore, MMU virtualization should be supported, which is transparent to the guest OS.
The guest OS continues to control the mapping of virtual addresses to the physical memory addresses
of VMs. But the guest OS cannot directly access the actual machine memory.
The VMM is responsible for mapping the guest physical memory to the actual machine memory.
Following figure shows the two-level memory mapping procedure.
Each page table of the guest OSes has a separate page table in the VMM corresponding to it, the VMM
page table is called the shadow page table.
There are three ways to implement I/O virtualization: full device emulation, para-virtualization, and
direct I/O.
Full device emulation is the first approach for I/O virtualization. Generally, this approach emulates
well-known, real-world devices.
The frontend driver is running in Domain U and the backend driver is running in Domain 0. They
interact with each other via a block of shared memory.
The frontend driver manages the I/O requests of the guest OSes and the backend driver is
responsible for managing the real I/O devices and multiplexing the I/O data of different VMs.
Although para-I/O-virtualization achieves better device performance than full device emulation, it
comes with a higher CPU overhead.
Direct I/O virtualization lets the VM access devices directly. It can achieve close-to-native
performance without high CPU costs.
However, current direct I/O virtualization implementations focus on networking for mainframes.
There are a lot of challenges for commodity hardware devices.
Introduction:
Amazon Web Services (AWS) is a comprehensive and widely-used cloud computing platform provided
by Amazon.com.
It offers a vast array of on-demand cloud services that enable businesses and individuals to build,
deploy, and manage various applications and services in a highly scalable and cost-effective manner.
AWS provides a wide range of services across numerous categories, including computing power,
storage, databases, networking, machine learning, analytics, security, and more. These services are
designed to meet the requirements of different use cases, from small startups to large enterprises.
10. Amazon Elastic Beanstalk: It simplifies the deployment and management of applications by providing
an easy-to-use platform for deploying and scaling web applications developed in various languages.
1. Continued Growth and Innovation: AWS has been consistently expanding its portfolio of
services and features, catering to a wide range of customer needs. As technology advances,
AWS is expected to introduce more innovative solutions, especially in areas like artificial
intelligence (AI), machine learning (ML), Internet of Things (IoT), and serverless computing.
The company's focus on research and development, coupled with its large customer base,
positions it to maintain its leadership in the market.
2. Hybrid and Multi-Cloud Solutions: Many organizations are adopting a hybrid cloud strategy,
combining on-premises infrastructure with public cloud services. AWS recognizes this trend
and offers solutions like AWS Outposts, which brings AWS infrastructure and services to on-
premises data centers. Additionally, AWS has partnerships with other major cloud providers,
enabling customers to deploy applications seamlessly across multiple clouds. Expect AWS to
continue expanding its hybrid and multi-cloud offerings to cater to evolving customer
requirements.
3. Edge Computing and IoT: As the number of connected devices increases, there is a growing
demand for processing data at the edge to reduce latency and improve real-time decision-
making. AWS has already made strides in this area with services like AWS Greengrass and
AWS IoT Core. In the future, AWS is likely to enhance its edge computing capabilities,
enabling organizations to deploy and manage applications at the edge efficiently.
4. Advanced Analytics and AI/ML: Data analytics, AI, and ML are transforming industries
across the board. AWS has a robust suite of analytics and AI/ML services, including Amazon
Redshift, Amazon Athena, Amazon SageMaker, and Amazon Rekognition. AWS is expected to
invest further in these areas, enabling customers to extract valuable insights from their data and
build sophisticated AI/ML models.
1. Computing Services:
Amazon Elastic Compute Cloud (EC2): Virtual servers in the cloud for running applications.
AWS Lambda: Serverless compute service for running code without provisioning or managing
servers.
AWS Batch: Fully managed batch processing at any scale.
2. Storage Services:
Amazon Simple Storage Service (S3): Scalable object storage for storing and retrieving data.
Amazon Elastic Block Store (EBS): Persistent block-level storage volumes for EC2 instances.
Amazon Glacier: Low-cost storage service for archiving and long-term backup.
3. Database Services:
Amazon Relational Database Service (RDS): Managed database service for popular relational
databases.
Amazon DynamoDB: Fully managed NoSQL database service.
Amazon Aurora: MySQL and PostgreSQL-compatible relational database with high
performance and availability.
4. Networking Services:
Amazon Virtual Private Cloud (VPC): Isolated virtual network to launch AWS resources.
AWS Direct Connect: Dedicated network connection between on-premises infrastructure and
AWS.
Amazon Route 53: Scalable domain name system (DNS) web service.
5. Analytics Services:
Amazon Redshift: Fast, fully managed data warehousing service.
Amazon Athena: Serverless query service for analyzing data in Amazon S3.
Amazon Kinesis: Real-time streaming data processing service.
AWS IoT Core: Managed cloud platform for securely connecting and managing IoT devices.
AWS IoT Analytics: Analytics service for IoT devices and data.
Amazon EC2
Amazon Elastic Compute Cloud (EC2) is a core service offered by Amazon Web Services (AWS) that
provides resizable compute capacity in the cloud. It enables users to easily launch and manage virtual
servers, known as EC2 instances, to run their applications.
1. Virtual Servers: EC2 allows users to create virtual servers in the cloud, known as EC2
instances. Users can choose from a variety of instance types that vary in terms of computing
power, memory, storage, and networking capacity, allowing them to select the most suitable
instance for their workload.
2. Scalability: EC2 provides auto-scaling capabilities, allowing users to automatically adjust the
number of instances based on the workload demands. This enables applications to handle
fluctuations in traffic and ensures optimal performance and cost efficiency.
3. Flexible Pricing Options: EC2 offers various pricing models to match different usage patterns
and requirements. Users can choose from On-Demand Instances (pay-as-you-go), Reserved
Instances (upfront commitment for discounted pricing), and Spot Instances (bid-based pricing
for unused capacity). This flexibility allows users to optimize costs based on their specific
needs.
4. Multiple Operating Systems: EC2 supports a wide range of operating systems, including
popular Linux distributions, Windows Server, and other specialized operating systems. This
allows users to run their applications on the preferred operating system.
5. Security and Networking: EC2 provides robust security features, including virtual private
cloud (VPC) integration, security groups, network access control lists (ACLs), and the ability to
configure firewall settings. Users have control over network connectivity and can configure
private subnets, define access rules, and establish VPN connections.
6. Storage Options: EC2 offers various storage options to meet different application
requirements. Users can attach Elastic Block Store (EBS) volumes as persistent block-level
storage to their instances. Additionally, users can leverage Amazon S3 for object storage,
Amazon Elastic File System (EFS) for scalable file storage, and instance store volumes for
temporary storage.
7. Integration with Other AWS Services: EC2 integrates seamlessly with other AWS services,
enabling users to leverage the full capabilities of the AWS ecosystem. This includes services
Mr.M.Vengateshwaran M.E., (Ph.D) Asst.Prof/CSE Cloud Computing
like Amazon RDS for managed databases, Amazon S3 for object storage, AWS Lambda for
serverless computing, and more.
8. Monitoring and Management: EC2 provides monitoring and management tools to help users
monitor the health, performance, and utilization of their instances. Users can utilize Amazon
CloudWatch to collect and analyze metrics, set up alarms, and automate actions based on
predefined rules.
EC2 is widely used by organizations of all sizes, from startups to enterprises, to deploy a wide range of
applications, including web servers, databases, data processing, and machine learning workloads. Its
scalability, flexibility, and integration with other AWS services make it a popular choice for running
applications in the cloud.
EBS:
EBS stands for Elastic Block Store.
EC2 is a virtual server in a cloud while EBS is a virtual disk in a cloud.
Amazon EBS allows you to create storage volumes and attach them to the EC2 instances.
Once the storage volume is created, you can create a file system on the top of these volumes, and then
you can run a database, store the files, applications or you can even use them as a block device in some
other way.
Amazon EBS volumes are placed in a specific availability zone, and they are automatically replicated
to protect you from the failure of a single component.
EBS volume does not exist on one disk, it spreads across the Availability Zone. EBS volume is a disk
which is attached to an EC2 instance.
EBS volume attached to the EC2 instance where windows or Linux is installed known as Root device
of volume.
SSD:
SSD stands for solid-state Drives.
In June 2014, SSD storage was introduced.
It is a general purpose storage.
It supports up to 4000 IOPS which is quite very high.
SSD storage is very high performing, but it is quite expensive as compared to HDD (Hard Disk Drive)
storage.
SSD volume types are optimized for transactional workloads such as frequent read/write operations
with small I/O size, where the performance attribute is IOPS.
Now, we have different Amazon Machine Images. These are the snapshots of different virtual machines. We
will be using Amazon Linux AMI 2018.03.0 (HVM) as it has built-in tools such as java, python, ruby, perl,
and especially AWS command line tools.
The main setup page of EC2 is shown below where we define setup configuration.
2. Scalability and Durability: S3 is built to scale and provides high durability for stored data. It
automatically replicates data across multiple devices and multiple geographically distributed data
centers, ensuring durability and availability.
3. Storage Classes: S3 offers multiple storage classes to meet different requirements and optimize
costs. These include:
Standard: The default storage class with high durability, availability, and low latency access.
Intelligent-Tiering: Automatically moves objects between frequent and infrequent access tiers
based on usage patterns.
Glacier: Suitable for long-term archival storage with lower costs but longer retrieval times.
Glacier Deep Archive: Designed for archival storage with the lowest cost and longer retrieval
times.
4. Data Transfer and Access Control: S3 provides secure data transfer over HTTPS and allows
granular access control. Access permissions can be managed using AWS Identity and Access
Management (IAM), bucket policies, and Access Control Lists (ACLs).
5. Versioning and Lifecycle Policies: S3 supports versioning, allowing users to preserve, retrieve,
and restore previous versions of objects. Lifecycle policies enable automated transitions of objects
between storage classes based on predefined rules, helping optimize costs.
6. Data Management and Analytics: S3 integrates with various AWS services for data management
and analytics purposes. This includes Amazon Athena for ad hoc querying of data using standard
SQL, Amazon Redshift for data warehousing, and Amazon Macie for data discovery and security.
7. Event Notifications and Triggers: S3 supports event notifications and triggers through Amazon
Simple Notification Service (SNS) and AWS Lambda. This enables users to respond to changes in
their S3 buckets, such as new object uploads or deletions, by triggering actions or workflows.
9. Security and Compliance: S3 incorporates robust security features, including encryption at rest
and in transit, access control mechanisms, and integration with AWS Identity and Access
Management (IAM). It also supports compliance standards and regulations such as HIPAA, GDPR,
and PCI DSS.
AWS S3 is widely used for a variety of use cases, including backup and restore, content storage and
distribution, data lakes and analytics, application hosting, and media hosting. Its scalability, durability,
cost-effectiveness, and rich set of features make it a fundamental component of many cloud-based
applications and architectures.
Advantages of S3:
Amazon S3 Concepts:
1. Buckets
A bucket is a container used for storing the objects.
Every object is incorporated in a bucket.
For example, if the object named photos/tree.jpg is stored in the treeimage bucket, then it can be
addressed by using the URL http://treeimage.s3.amazonaws.com/photos/tree.jpg.
A bucket has no limit to the amount of objects that it can store. No bucket can exist inside of other
buckets.
S3 performance remains the same regardless of how many buckets have been created.
The AWS user that creates a bucket owns it, and no other AWS user cannot own it. Therefore, we can
say that the ownership of a bucket is not transferrable.
Mr.M.Vengateshwaran M.E., (Ph.D) Asst.Prof/CSE Cloud Computing
The AWS account that creates a bucket can delete a bucket, but no other AWS user can delete the
bucket.
2. Objects
Objects are the entities which are stored in an S3 bucket.
An object consists of object data and metadata where metadata is a set of name-value pair that
describes the data.
An object consists of some default metadata such as date last modified, and standard HTTP metadata,
such as Content type. Custom metadata can also be specified at the time of storing an object.
It is uniquely identified within a bucket by key and version ID.
3. Key
A key is a unique identifier for an object.
Every object in a bucket is associated with one key.
An object can be uniquely identified by using a combination of bucket name, the key, and optionally
version ID.
For example, in the URL http://jtp.s3.amazonaws.com/2019-01-31/Amazons3.wsdl where "jtp" is the
bucket name, and key is "2019-01-31/Amazons3.wsdl"
4. Regions
You can choose a geographical region in which you want to store the buckets that you have created.
A region is chosen in such a way that it optimizes the latency, minimize costs or address regulatory
requirements.
Objects will not leave the region unless you explicitly transfer the objects to another region.
5. Data Consistency Model
Amazon S3 replicates the data to multiple servers to achieve high availability.
Two types of model:
1. Read-after-write consistency for PUTS of new objects.
For a PUT request, S3 stores the data across multiple servers to achieve high availability.
A process stores an object to S3 and will be immediately available to read the object.
A process stores a new object to S3, it will immediately list the keys within the bucket.
It does not take time for propagation, the changes are reflected immediately.
2. Eventual consistency for overwrite PUTS and DELETES
For PUTS and DELETES to objects, the changes are reflected eventually, and they are not
available immediately.
If the process replaces an existing object with the new object, you try to read it immediately.
Until the change is fully propagated, the S3 might return prior data.
If the process deletes an existing object, immediately try to read it. Until the change is fully
propagated, the S3 might return the deleted data.
If the process deletes an existing object, immediately list all the keys within the bucket.
Until the change is fully propagated, the S3 might return the list of the deleted key.
Cloud Storage:
Cloud storage is a cloud computing model that stores data on the Internet
through a cloud computing provider who manages and operates data storage
as a service.
It’s delivered on demand with just-in-time capacity and costs, and eliminates
buying and managing your own data storage infrastructure. This gives you
agility, global scale and durability, with “anytime, anywhere” data access.
Cloud storage is purchased from a third party cloud vendor who owns and operates
data storage capacity and delivers it over the Internet in a pay-as-you-go model.
These cloud storage vendors manage capacity, security and durability to make data
accessible to your applications all around the world.
Mr.M.Vengateshwaran M.E., (Ph.D) Asst.Prof/CSE Cloud Computing
Applications access cloud storage through traditional storage protocols or directly via
an API. Many vendors offer complementary services designed to help collect,
manage, secure and analyze data at massive scale.
1. Public Cloud Storage: Suitable for unstructured data, public cloud storage is offered by
third-party cloud storage providers over the open Internet. They may be available for free or
on a paid basis. Users are usually required to pay for only what they use.
As the name suggests, hybrid cloud allows data and applications to be shared between a
public and a private cloud. Businesses that have a secret, the on-premise solution can
seamlessly scale up to the public cloud to handle any short-term spikes or overflow.
1. Object Storage - Applications developed in the cloud often take advantage of object
storage’s vast scalability and metadata characteristics. Object storage solutions like
Amazon Simple Storage Service (S3) are ideal for building modern applications from
scratch that require scale and flexibility, and can also be used to import existing data stores
for analytics, backup, or archive.
2. File Storage - Some applications need to access shared files and require a file system. This
type of storage is often supported with a Network Attached Storage (NAS) server. File
storage solutions like Amazon Elastic File System (EFS) are ideal for use cases like large
content repositories, development environments, media stores, or user home directories.
3. Block Storage - Other enterprise applications like databases or ERP systems often require
dedicated, low latency storage for each host. This is analagous to direct- attached storage
(DAS) or a Storage Area Network (SAN). Block-based cloud storage solutions like
Amazon Elastic Block Store (EBS) are provisioned with each virtual server and offer the
ultra low latency required for high performance workloads.
The availability, durability, and cost benefits of cloud storage can be very compelling
to business owners, but traditional IT functional owners like storage, backup,
networking, security, and compliance administrators may have concerns around the
realities of transferring large amounts of data to the cloud.
Cloud data migration services services such as AWS Import/Export Snowball can
simplify migrating storage into the cloud by addressing high network costs, long
transfer times, and security concerns.
4. Compliance
Storing data in the cloud can raise concerns about regulation and compliance, especially
if this data is already stored in compliant storage systems. Cloud data compliance
controls like
Amazon Glacier Vault Lock are designed to ensure that you can easily deploy and
enforce compliance controls on individual data vaults via a lockable policy.
You can specify controls such as Write Once Read Many (WORM) to lock the data
from future edits.
Using audit log products like AWS CloudTrail can help you ensure compliance and
governance objectives for your cloud-based storage and archival systems are being
met.
AWS IAM:
AWS Identity and Access Management (IAM) is a service provided by Amazon Web Services
(AWS) that enables users to manage access and permissions to AWS resources. IAM allows
organizations to create and control multiple users, groups, and roles, and define fine-grained
permissions to access various AWS services and resources.
Users: IAM allows you to create individual users within your AWS account. Each user is assigned
a unique access key and secret access key that they can use to interact with AWS programmatically
or through the AWS Command Line Interface (CLI).
Groups: IAM groups are collections of IAM users. By assigning permissions to groups, you can
manage permissions for multiple users collectively, simplifying access management.
Roles: IAM roles are used to grant temporary access to AWS resources to entities like IAM users,
applications, or AWS services. Roles can be assumed by entities and inherit permissions associated
with the role. This approach improves security by reducing the need for long-term credentials.
Policies: IAM policies are JSON documents that define permissions. They are attached to IAM
users, groups, and roles to specify what actions they can perform on which resources. Policies can
be custom-defined or leverage AWS-managed policies that cover common use cases.
Access Control: IAM provides fine-grained access control through policies. Policies allow you to
specify the actions that are allowed or denied, the resources on which the actions can be performed,
and the conditions under which the permissions are granted.
Multi-Factor Authentication (MFA): IAM supports MFA, which adds an extra layer of security
to user sign-ins. With MFA, users are required to provide an additional authentication factor, such
as a one-time password generated by a virtual or physical MFA device.
Identity Federation: IAM supports identity federation, allowing you to grant temporary access to
AWS resources to users authenticated by external identity providers (such as Active Directory,
Facebook, or Google). This enables organizations to use existing identity systems and extend them
to AWS.
AWS Organizations Integration: IAM can be integrated with AWS Organizations, a service that
allows you to manage multiple AWS accounts centrally. This integration enables you to apply and
enforce policies across multiple accounts, making it easier to manage permissions and access
control.
Audit and Compliance: IAM provides features for monitoring and auditing user activity, including
AWS CloudTrail integration. CloudTrail records API calls made to IAM and other AWS services,
allowing you to track actions taken by users and meet compliance requirements.
Choose the service that you want to use with the role.
Select the managed policy that attaches the permissions to the service.
1. Shared Responsibility Model: AWS follows a shared responsibility model, where AWS is responsible
for the security of the cloud infrastructure, while customers are responsible for securing their
applications and data running on AWS. This model ensures a collaborative approach to security.
2. Identity and Access Management (IAM): AWS IAM allows users to manage access to AWS
resources by creating and managing users, groups, and roles. IAM enables fine-grained access control
through policies and supports multi-factor authentication (MFA) for added security.
3. Encryption: AWS provides robust encryption options to protect data in transit and at rest. Transport
Layer Security (TLS) encryption is used for securing data in transit, while AWS Key Management
Service (KMS) enables customers to manage encryption keys for data at rest, including database
storage, EBS volumes, and S3 objects.
4. Network Security: Amazon Virtual Private Cloud (VPC) allows users to create isolated virtual
networks in the cloud. VPC provides granular control over network settings, including subnets,
security groups, and network access control lists (ACLs). AWS also offers Distributed Denial of
Service (DDoS) protection and AWS Shield for protecting applications against cyberattacks.
5. Monitoring and Logging: AWS offers various monitoring and logging services to help users track and
analyze security events. Amazon CloudWatch allows users to monitor resources and receive real-time
insights, while AWS CloudTrail records API calls made to AWS services for auditing and compliance
purposes.
6. Security Compliance and Certifications: AWS has numerous compliance certifications, including
SOC 1, SOC 2, ISO 27001, HIPAA, and PCI DSS, among others. These certifications ensure that
AWS meets industry-recognized security standards and can support customers in meeting their
compliance requirements.
7. Incident Response and Forensics: AWS provides services and features to help customers respond to
security incidents and perform forensic investigations. This includes AWS Incident Response, which
provides guidance and support during security incidents, and AWS Artifact, which provides access to
AWS compliance reports and documents.
8. Security Automation: AWS offers automation tools such as AWS Config, AWS CloudFormation, and
AWS Systems Manager to help users implement security best practices and automate security-related
tasks, ensuring consistent security configurations across environments.
9. Partner Ecosystem: AWS has a broad partner ecosystem that includes security vendors and solutions.
Customers can leverage these partners to enhance their security posture, implement advanced threat
detection and prevention, and achieve comprehensive security solutions.
It's important to note that while AWS provides a secure infrastructure, customers must implement
security best practices and configure their applications and resources correctly to ensure optimal
security. AWS provides extensive documentation, best practice guides, and security whitepapers to help
users understand and implement effective security measures.
1. Identity Creation: You start by creating IAM identities, which can be users, groups, or roles.
2. Users: IAM users are individual identities associated with a person or an application that
interact with AWS resources. Users are assigned unique access credentials (access key ID and
secret access key) for programmatic access or can use the AWS Management Console with their
own username and password.
3. Groups: IAM groups are collections of IAM users. You can assign permissions to groups
instead of managing permissions individually for each user. Users can be added or removed
from groups as needed, and they inherit the permissions assigned to the group.
4. Roles: IAM roles are used to delegate access to entities like IAM users, AWS services, or even
external identity providers. Roles have policies attached to them, specifying the permissions that
can be assumed by entities assuming the role. Roles are often used for granting temporary
access or for cross-account access.
5. Policy Creation: IAM policies define what actions are allowed or denied on AWS resources.
Policies are JSON documents that specify the permissions and resources associated with
identities or roles. You can create custom policies or use AWS-managed policies that cover
common use cases. Policies can be attached to users, groups, or roles.
6. Access Control: IAM enables fine-grained access control based on policies. Policies define
permissions using AWS service actions, resource ARNs (Amazon Resource Names), conditions,
and more. You can grant or deny permissions at the service, resource, or even individual API
operation level. IAM policies can also include conditions that define additional constraints for
access.
7. Authentication and Authorization: IAM handles authentication and authorization for AWS
resources. When an IAM user or entity requests access to an AWS resource, IAM authenticates
the identity and verifies the permissions associated with that identity. IAM ensures that users
and entities have the necessary permissions to perform actions on resources based on the
policies assigned to them.
8. Integration with AWS Services: IAM integrates with various AWS services to enable secure
access and control. For example, IAM roles can be assumed by AWS services to grant them
permissions to perform actions on your behalf. IAM policies can also be used to grant access to
specific AWS resources like S3 buckets, EC2 instances, or RDS databases.
9. Monitoring and Auditing: IAM activity can be monitored and audited using AWS CloudTrail.
CloudTrail records API calls made to IAM and other AWS services, allowing you to track
changes, detect unauthorized access attempts, and comply with security and auditing
requirements.
IAM provides a centralized and secure way to manage access to AWS resources. It follows the principle
of least privilege, ensuring that users and entities have only the necessary permissions to perform their
tasks. By using IAM, you can effectively control and secure access to your AWS environment.
1. Origin: The origin is the source of the content that CloudFront delivers to end users. It can be
an Amazon S3 bucket, an Elastic Load Balancer, an EC2 instance, or a custom HTTP server
outside of AWS. CloudFront retrieves the content from the origin server when it is not present
in its cache or when the content has expired.
2. Distribution: A distribution is a collection of edge locations that serve content to end users.
When you create a CloudFront distribution, you specify the origin(s) from which CloudFront
should retrieve content. There are two types of distributions: web distributions for delivering
web content and RTMP (Real-Time Messaging Protocol) distributions for streaming media
content.
3. Edge Locations: CloudFront uses a network of edge locations located around the world to
cache and deliver content. Edge locations are geographically distributed points of presence
(PoPs) that act as caches for frequently accessed content. When a user requests content,
CloudFront delivers it from the nearest edge location, reducing latency and improving
performance.
4. Cache: CloudFront caches content at edge locations based on configurable cache behaviors. The
cache behaviors define how CloudFront should handle specific requests, including whether to
cache the content and for how long. Cached content is stored in the edge location's cache until it
expires or is evicted due to space constraints.
5. Content Delivery: When a user requests content, the request is routed to the nearest edge
location. If the requested content is already cached and has not expired, CloudFront delivers it
directly from the cache, resulting in low latency. If the content is not in the cache or has expired,
CloudFront retrieves it from the origin server, caches it in the edge location, and then delivers it
to the user.
6. SSL/TLS Encryption: CloudFront supports SSL/TLS encryption for secure content delivery.
You can configure CloudFront to use HTTPS to encrypt content in transit between the edge
locations and end users. CloudFront also supports the use of custom SSL/TLS certificates or
integrates with AWS Certificate Manager for managing SSL/TLS certificates.
7. Access Control: CloudFront provides various mechanisms for access control and security. You
can use AWS Identity and Access Management (IAM) to control who can create, configure, and
manage CloudFront distributions. Additionally, you can use CloudFront signed URLs or signed
cookies to restrict access to specific content or limit access duration.
8. Monitoring and Reporting: CloudFront integrates with AWS CloudWatch, which allows you
to monitor and gain insights into the performance and behavior of your CloudFront
distributions. CloudFront provides metrics, logs, and real-time data that you can use for
monitoring, troubleshooting, and optimizing the delivery of your content.
By leveraging the components and capabilities of CloudFront, you can distribute content globally with
low latency, high availability, and improved performance for your end users, enhancing their browsing
or streaming experience.
AWS CloudFront offers several benefits for content delivery and performance optimization:
Low Latency and High Performance: CloudFront uses a network of globally distributed edge
locations to deliver content from the nearest edge location to end users, reducing latency and
improving performance. This results in faster load times and a better user experience.
Global Content Delivery: CloudFront's extensive network of edge locations spans across the
globe, allowing you to distribute your content to users worldwide. It helps reduce the distance
between users and your content, enabling faster delivery regardless of their geographical
location.
Scalability: CloudFront is highly scalable and can handle traffic spikes and sudden increases in
demand. It automatically scales to accommodate varying levels of traffic, ensuring that your
content remains accessible and responsive during peak usage periods.
Cost-Effective: CloudFront offers a pay-as-you-go pricing model, where you only pay for the
data transfer and requests made by your users. The pricing is based on the amount of data
transferred and the number of requests, allowing you to optimize costs based on your actual
usage.
Caching and Content Optimization: CloudFront caches your content at edge locations,
reducing the load on your origin server and improving response times. By caching content
closer to end users, CloudFront minimizes the need to fetch content from the origin server,
resulting in faster delivery.
Security: CloudFront supports various security features, including SSL/TLS encryption for
secure content delivery over HTTPS. You can also control access to your content using
CloudFront signed URLs or signed cookies, ensuring that only authorized users can access your
protected content.
Integration with AWS Services: CloudFront seamlessly integrates with other AWS services,
such as Amazon S3, Amazon EC2, and AWS Lambda. You can easily combine CloudFront
with these services to deliver dynamic, static, or streaming content, enabling a cohesive and
scalable architecture.
Snapshots:
A snapshot is a point-in-time copy of the data stored in an Amazon Elastic Block Store
(EBS) volume. It captures the data and configuration of the volume at the time the
snapshot is taken.
Snapshots are primarily used for backup, recovery, and data persistence. They allow you to
create a backup of your EBS volumes and protect against data loss.
Snapshots are incremental, meaning that after the initial snapshot, subsequent snapshots
only capture the changes since the previous snapshot. This helps in efficient storage
utilization and faster backup operations.
Snapshots are stored in Amazon S3, and you are charged based on the size of the data
stored in the snapshot.
Here are some key details about AWS snapshots:
1. Point-in-time Copies
2. Incremental Backups
3. Fast and Easy Backups
4. Encryption
5. Cost-effective
6. Region-based
7. Flexible Recovery Options
An AMI is a pre-configured image that contains the root file system, applications, and
configuration necessary to launch an EC2 instance. It captures the entire state of an EC2
instance at the time the AMI is created.
AMIs are used for creating and launching new EC2 instances with the same configuration
as the original instance. They provide a convenient way to replicate instances or create
templates for consistent deployments.
AMIs include the operating system, installed software, and any additional data or
configurations present on the instance's root volume.
AMIs can be public, shared with other AWS accounts, or privately owned by your
account. You can also create custom AMIs from existing EC2 instances or from snapshots.
AMIs are stored in Amazon S3 and are charged based on the storage size of the AMI and
any associated snapshots.
Here are some key details about AWS AMIs:
1. Types of AMIs
2. Customization
3. Versioning
4. Storage
5. Security
6. Sharing
7. Licensing
1. Amazon EC2 Auto Scaling: This scaling plan is used to automatically adjust the
number of Amazon EC2 instances in an Auto Scaling group based on predefined
scaling policies. It allows you to define rules for scaling in and out based on metrics
such as CPU utilization, network traffic, or custom metrics. EC2 Auto Scaling helps
maintain application availability, optimize performance, and manage costs by
automatically adding or removing instances as needed.
2. Application Auto Scaling: Application Auto Scaling is a service that allows you to
automatically scale other AWS resources beyond EC2 instances. It supports scaling for
various services such as Amazon ECS (Elastic Container Service), DynamoDB,
Amazon Aurora, Amazon AppStream 2.0, and more. With Application Auto Scaling,
you can define scaling policies based on specific metrics and conditions related to the
respective service, enabling automatic scaling of resources to handle changes in
demand.
3. AWS Auto Scaling: AWS Auto Scaling provides a unified scaling experience across
multiple AWS services. It combines the capabilities of EC2 Auto Scaling, Application
Auto Scaling, and other scaling features to offer a comprehensive scaling solution.
AWS Auto Scaling allows you to define scaling policies across different services,
ensuring that your application components scale in a coordinated manner to meet
demand while optimizing resource utilization.
5. Amazon RDS Auto Scaling: RDS Auto Scaling allows you to automatically adjust the
capacity of Amazon RDS (Relational Database Service) instances based on predefined
scaling policies. It helps maintain optimal performance and availability of your
database by adding or removing RDS instances in response to changes in demand.
RDS Auto Scaling supports scaling based on metrics such as CPU utilization or
database connections.
AWS provides various scaling plans that allow you to scale your applications and infrastructure
in response to changing demand. Here are some of the different scaling plans in AWS:
1. Horizontal Scaling: Horizontal scaling is the process of adding more instances to your
application to handle increased traffic. This can be achieved using AWS Auto Scaling,
which automatically adjusts the number of EC2 instances based on the load on your
application.
2. Vertical Scaling: Vertical scaling is the process of increasing the capacity of an individual
instance to handle more traffic. This can be achieved by upgrading the instance type or
increasing the size of the instance's resources, such as CPU, memory, and storage.
Load balancing in AWS is a crucial component for achieving high availability, scalability, and
fault tolerance in your applications or services. AWS offers multiple load balancing services that
can distribute traffic across multiple resources to ensure efficient resource utilization and optimal
performance. The main load balancing services provided by AWS are:
Elastic Load Balancing (ELB): Elastic Load Balancing is a fully managed load balancing
service that distributes incoming traffic across multiple EC2 instances, containers, IP
addresses, or Lambda functions within a specific region. ELB automatically scales the load
balancer as traffic patterns change, ensuring that your application can handle varying
levels of traffic. AWS offers three types of ELB:
a. Classic Load Balancer (CLB): This is the traditional load balancer provided by AWS. It
operates at the transport layer (Layer 4) and can distribute traffic across EC2 instances.
b. Application Load Balancer (ALB): ALB operates at the application layer (Layer 7) and
provides advanced features such as content-based routing, path-based routing, and support
for HTTP/HTTPS protocols. It is suitable for modern web applications.
c. Network Load Balancer (NLB): NLB operates at the network layer (Layer 4) and is
designed to handle high volumes of traffic with ultra-low latencies. It is ideal for TCP,
UDP, and TLS traffic.
AWS Global Accelerator: AWS Global Accelerator improves the availability and
performance of your applications for global users by routing traffic through the AWS
global network infrastructure. It uses the AWS edge locations to direct traffic to the nearest
application endpoint, reducing latency and improving global application responsiveness.
Mr.M.Vengateshwaran M.E., (Ph.D) Asst.Prof/CSE Cloud Computing
AWS Application Auto Scaling: While not a load balancer itself, AWS Application Auto
Scaling complements load balancing by automatically adjusting the capacity of other AWS
resources based on defined scaling policies. It supports scaling for services like Amazon
ECS, DynamoDB, Aurora, and more, ensuring that the resources scale in sync with the
load.
Benefits
By leveraging these load balancing services in AWS, you can achieve benefits such as:
High availability: Load balancers distribute traffic across multiple resources, ensuring that if
one resource becomes unavailable, traffic is automatically routed to healthy resources,
minimizing downtime.
Scalability: Load balancers can dynamically scale the resources they distribute traffic to,
allowing your application to handle varying levels of traffic without manual intervention.
Fault tolerance: Load balancers monitor the health of resources and automatically route
traffic away from unhealthy resources, improving the fault tolerance of your application or
service.
Simplified management: AWS load balancing services are fully managed, meaning that
AWS handles the operational aspects such as capacity provisioning, scaling, and health
monitoring, allowing you to focus on your application logic.
2. Least Connection: With the Least Connection algorithm, incoming requests are routed
to the server with the fewest active connections at the time the request is received. This
algorithm ensures that the load is distributed based on the current load on the servers,
aiming to achieve a more balanced distribution.
3. IP Hash: In this algorithm, the client's IP address is used to determine which server
will handle the request. The IP address is hashed, and the resulting value is used to
select the server. This ensures that requests from the same IP address are consistently
routed to the same server, which can be useful for maintaining session persistence.
4. Least Time: The Least Time algorithm considers the response time of each server and
routes requests to the server with the lowest response time. This approach aims to
minimize the overall response time for the client.
5. Weighted Round Robin: In this algorithm, each server is assigned a weight that
corresponds to its processing capacity or performance. Servers with higher weights
receive a larger proportion of the traffic, enabling load balancing that takes into
account the server's capabilities.
6. Least Bandwidth: The Least Bandwidth algorithm routes requests to the server with
the least amount of current network traffic. This approach aims to balance the network
load across servers based on their available bandwidth.
It's important to note that not all load balancing algorithms are available in every load
balancing service. AWS load balancing services, such as Elastic Load Balancing (ELB) and
Application Load Balancer (ALB), provide built-in load balancing algorithms specific to each
service.
The choice of load balancing algorithm depends on various factors, including the nature of the
workload, the capabilities of the servers or resources, and the desired performance and
behavior of the application. AWS load balancing services typically offer configurable options
to select the appropriate algorithm or provide a default behavior that is suitable for most use
cases.
Mr.M.Vengateshwaran M.E., (Ph.D) Asst.Prof/CSE Cloud Computing
Types of Load Balancing
Application load balancing
Network load balancing
Global server load balancing
DNS load balancing
Docker
Docker is an open-source platform that allows you to automate the deployment, scaling, and
management of applications using containerization.
Containers are lightweight and isolated environments that package an application and its
dependencies, providing consistency across different computing environments.
2. Image: An image is a read-only template used to create containers. It contains the application
code, runtime, and dependencies required to run the application. Docker images are built from a
set of instructions called a Dockerfile.
3. Dockerfile: A Dockerfile is a text file that contains a set of instructions to build a Docker image.
It specifies the base image, application code, dependencies, and other configurations needed for
the container.
4. Docker Hub: Docker Hub is a public registry that hosts thousands of Docker images. It allows
users to share and download pre-built Docker images for various applications, frameworks, and
services.
6. Docker Compose: Docker Compose is a tool used to define and manage multi-container Docker
applications. It allows you to describe the services, networks, and volumes required for your
application using a YAML file.
7. Orchestration: Docker Swarm and Kubernetes are popular container orchestration platforms
that help manage and scale containerized applications across multiple hosts or nodes. They
provide features like service discovery, load balancing, scaling, and high availability.
Using Docker, developers can create consistent development and production environments, simplify
application deployment, and improve scalability. It also promotes collaboration and sharing of software
components through container images. Docker has gained significant popularity in the software
development industry due to its ease of use, portability, and resource efficiency.
2. Then the Docker container goes into the running state when the Docker run command is used.
6. The Docker run command is used to put a container back from a stopped state to a running state.
Docker Architecture
The server is the physical server that is used to host multiple virtual machines.
Mr.M.Vengateshwaran M.E., (Ph.D) Asst.Prof/CSE Cloud Computing
The Host OS is the base machine such as Linux or Windows.
The Hypervisor is either VMWare or Windows Hyper V that is used to host virtual
machines.
You would then install multiple operating systems as virtual machines on top of the
existing hypervisor as Guest OS.
You would then host your applications on top of each Guest OS.
The following image shows the new generation of virtualization that is enabled via Dockers.
Let’s have a look at the various layers.
The server is the physical server that is used to host multiple virtual machines. So this
layer remains the same.
The Host OS is the base machine such as Linux or Windows. So this layer remains the
same.
Now comes the new generation which is the Docker engine. This is used to run the
operating system which earlier used to be virtual machines as Docker containers.
All of the Apps now run as Docker containers.
The clear advantage in this architecture is that you don’t need to have extra hardware for Guest
OS. Everything works as Docker containers.
Containers:
Containers are lightweight, standalone executable packages that include
everything needed to run an application, including code, runtime, system tools,
libraries, and settings.
Containers provide a consistent, reliable, and efficient way to run applications
across different environments, from development to production.
Containers are similar to virtual machines in that they provide a way to isolate an
application and its dependencies from the host system.
However, containers are more lightweight and efficient than virtual machines
because they share resources with the host system, rather than requiring their own
operating system and virtual hardware.
Containers can be used for a wide range of applications and use cases, including:
1. Development and testing: Containers provide a consistent, reproducible environment for
developing and testing applications, making it easier to identify and fix bugs and
compatibility issues.
2. Deployment and scaling: Containers make it easy to deploy and scale applications by
providing a consistent environment across different systems and environments.
3. Microservices: Containers are well-suited for building and deploying microservices, which
are small, independent components that work together to form a larger application.
4. Legacy applications: Containers can be used to modernize and containerize legacy
applications, making them more portable, scalable, and efficient.
Usage of Containers:
Containers have gained widespread adoption in various areas of software development and
deployment. Here are some common use cases for containers:
Application Deployment:
Containers provide a consistent and portable runtime environment for applications.
Developers can package their applications along with all the necessary dependencies and
configurations into a container image.
These images can be easily deployed across different environments, including development,
testing, staging, and production, ensuring that the application runs consistently across all
stages.
Microservices Architecture:
Containers are well-suited for building and deploying microservices-based architectures.
Each microservice can be packaged and deployed as a separate container, enabling
independent development, scalability, and deployment of individual components.
Containers facilitate the decoupling of services, making it easier to manage and scale
complex distributed applications.
These are just a few examples of how containers are used in various aspects of software
development and deployment. The flexibility, portability, and scalability offered by containers make
them a powerful tool for modern application development and deployment practices.
Terminology:
Some common terminology related to containers:
2. Container Image: A read-only template used to create containers. It includes the application
code, runtime, libraries, and dependencies required to run the application.
5. Container Registry: A repository that stores and distributes container images. Docker Hub is a
popular public container registry, but private registries like Amazon ECR, Google Container
Registry, or Azure Container Registry are also commonly used.
9. Volume: A mechanism in containers that allows data to persist beyond the lifecycle of the
container. Volumes can be used to store and share data between containers or between
containers and the host system.
10. Container Networking: The networking infrastructure that enables communication between
containers. Containers can be connected through virtual networks, allowing them to
communicate with each other or with external systems.
11. Container Orchestration Platform: Software platforms that automate the deployment, scaling,
and management of containers. Examples include Kubernetes, Docker Swarm, and Apache
Mesos.
12. Service Discovery: The process of automatically detecting and registering available services
within a container orchestration platform. It enables containers to find and communicate with
each other dynamically.
13. Load Balancing: The distribution of incoming network traffic across multiple containers to
optimize resource utilization and improve application performance. Load balancers can be
integrated with container orchestration platforms.
14. Auto-scaling: The capability of dynamically adjusting the number of containers based on
workload demand. Auto-scaling helps ensure that the application can handle increased traffic or
workload without manual intervention.
15. Immutable Infrastructure: An approach where containers are treated as disposable and are not
modified once deployed. Instead, any updates or changes are made by creating new container
images and deploying them.
Running a static website using Docker is a great way to create a portable and scalable web hosting
solution.
Here are the steps to run a static site using Docker:
1. Create a Dockerfile: The first step is to create a Dockerfile that specifies the base image,
copies the static files into the container, and exposes the container port.
Here's an example of a Dockerfile for a static site built with HTML and CSS:
# Specify the base image
FROM nginx: latest
# Copy the static files into the container
COPY. /usr/share/nginx/html
# Expose port 80
EXPOSE 80
2. Build the Docker image: Use the `docker build` command to build the Docker image from the
Dockerfile. The command should be run from the directory containing the Dockerfile:
docker build -t my-static-site.
The `-t` flag sets the image name and the `. ` specifies the build context.
3. Run the Docker container: Once the Docker image is built, run the container using the `docker
run` command:
docker run -d -p 8080:80 my-static-site
The `-d` flag runs the container in detached mode, `-p` maps the container port 80 to the host port
8080, and `my-static-site` is the name of the Docker image.
4. Access the website: Open a web browser and go to `http://localhost:8080` to view the static
website.
By following these simple steps, you can easily run a static website using Docker.
Docker Image
In Docker, images are the building blocks for containers. An image is a read-only template that
contains the necessary files, dependencies, and configurations to run a specific application or service
within a container.
Image Layers: Docker images are composed of multiple layers. Each layer represents a specific
modification or addition to the base image. Layering allows for efficient image sharing and
reusability, as common layers can be shared among multiple images.
Base Image: A base image serves as the starting point for creating other images. It typically
contains the minimal operating system or runtime environment required for running a specific type
of application. Examples of base images include official language runtimes (e.g., Python, Node.js)
or distribution-specific images (e.g., Ubuntu, Alpine).
Dockerfile: A Dockerfile is a text file that contains a series of instructions for building a Docker
image. It specifies the base image, additional software installations, file copying, environment
variables, exposed ports, and more. Docker uses the instructions in the Dockerfile to build a new
image layer by layer.
Image Repository: Docker images can be stored and managed in image repositories. Docker Hub is
a popular public image repository that hosts thousands of pre-built images. Private repositories, such
as Amazon ECR, Google Container Registry, and Azure Container Registry, allow organizations to
securely store and distribute their own Docker images.
Image Tag: An image tag is a label attached to an image to distinguish different versions or variants
of the same image. Tags are typically used to represent different versions, such as "latest," "v1.0," or
specific release numbers. When pulling or running an image, you can specify the tag to retrieve the
desired version.
Image Pull: To use an image, it needs to be pulled from a registry to the local Docker environment.
The docker pull command is used to download an image from a specified repository. If the image is
not found locally, Docker will fetch it from the repository.
Image Build: Docker builds images using the docker build command, which reads the instructions
from a Dockerfile and creates a new image based on those instructions. The build process involves
downloading necessary layers, executing the instructions, and generating a new image.
Image Layers and Caching: Docker utilizes layer caching during the image build process. If a
Dockerfile instruction has not changed since a previous build, Docker can reuse the corresponding
layer from cache. This caching mechanism speeds up subsequent builds, as unchanged layers do not
need to be rebuilt.
Image Tagging and Pushing: Once an image is built, it can be tagged with a specific version or
variant and pushed to a repository. The docker tag command is used to assign a new tag to an image,
and the docker push command is used to upload the image to a repository, making it available for
others to pull and use.
Docker images are fundamental to the containerization process, allowing for the reproducibility,
portability, and sharing of containerized applications and services. They enable developers to
package and distribute their applications, ensuring consistency across different environments and
simplifying the deployment process.
A Dockerfile is a text file that contains a set of instructions for building a Docker image. These
instructions define the steps to create the image, such as specifying the base image, copying files,
installing dependencies, setting environment variables, exposing ports, and executing commands.
Here's an overview of the Dockerfile syntax and some commonly used instructions:
Base image: The first line of a Dockerfile specifies the base image on which your image will be
built. It defines the starting point for your image. Example: FROM ubuntu:20.04.
Copy files: The COPY instruction is used to copy files and directories from the build context (the
directory containing the Dockerfile) to the image. Example: COPY app.py /app/.
Set working directory: The WORKDIR instruction sets the working directory for subsequent
instructions. Example: WORKDIR /app.
Install dependencies: Use RUN instruction to execute commands during the image build process.
You can install dependencies, run package managers, or perform any other necessary setup tasks.
Example: RUN apt-get update && apt-get install -y python3.
Expose ports: The EXPOSE instruction documents the ports that the container listens on at
runtime. It doesn't actually publish the ports. Example: EXPOSE 8080.
Set environment variables: The ENV instruction sets environment variables in the image.
Example: ENV MY_VAR=my_value.
Execute commands: The CMD instruction specifies the default command to run when a container
is created from the image. It can be overridden when starting the container. Example: CMD
["python3", "app.py"].
Build the image: Use the docker build command to build the image based on the Dockerfile.
Example: docker build -t my-image ..
Once the Dockerfile is ready, you can build the Docker image using the docker build command and
run containers based on that image using the docker run command.
Docker on AWS
Docker can be used on AWS (Amazon Web Services) to deploy and manage containers in the cloud.
AWS provides several services and features that integrate well with Docker, allowing you to build,
run, and scale containerized applications effectively.
Here are some key AWS services and features related to Docker:
1. Amazon Elastic Container Service (ECS): ECS is a fully managed container orchestration
service provided by AWS. It allows you to run Docker containers without managing the
underlying infrastructure. You can define task definitions that specify the container images,
resources, networking, and other configurations. ECS automatically handles container
deployment, scaling, and load balancing.
2. Amazon Elastic Kubernetes Service (EKS): EKS is a managed Kubernetes service on AWS. It
simplifies the deployment, management, and scaling of Kubernetes clusters. You can use EKS to
run Docker containers within Kubernetes pods, taking advantage of the rich ecosystem of
Kubernetes tools and features.
4. AWS Batch: AWS Batch is a service for running batch computing workloads, including
containerized applications. It provides a managed environment for executing jobs at scale, and
you can use Docker containers as the execution environment for your batch jobs.
6. Amazon Elastic Container Registry (ECR): ECR is a fully managed Docker container registry
provided by AWS. It allows you to store, manage, and deploy container images. You can push
your Docker images to ECR and use them in ECS, EKS, or other container orchestration
platforms.
7. AWS Cloud Development Kit (CDK): CDK is an open-source development framework that
allows you to define cloud infrastructure using familiar programming languages. You can use
CDK to define Docker containers, ECS clusters, networking, and other AWS resources in code,
providing a more programmatic and repeatable way of managing your Docker deployments on
AWS.
These are just a few examples of how Docker can be used on AWS. AWS provides a wide range of
services that can be integrated with Docker to build scalable, resilient, and cost-effective
containerized applications. The choice of services depends on your specific use case and
requirements.
Docker Network
Docker provides a networking feature that allows containers to communicate with each other and
with external networks. Docker networking enables containers to be connected together, isolated
from other containers or networks, and exposed to the host system or other containers.
Default Network: When Docker is installed, it creates a default bridge network called bridge.
Containers that are started without specifying a network explicitly are connected to this network by
default. Containers on the same bridge network can communicate with each other using IP
addresses.
Container Network Interface (CNI): Docker uses CNI plugins to manage container networking.
These plugins are responsible for creating and configuring the network interfaces of containers.
Docker supports multiple CNI plugins, including bridge, overlay, macvlan, host, and more.
Bridge Network: A bridge network is a private network internal to the Docker host. It allows
containers to communicate with each other using IP addresses. Containers on the same bridge
network can discover each other using their container names or IP addresses. By default, Docker
creates a bridge network called bridge when it is installed.
Host Network: In the host network mode, a container shares the network namespace with the
Docker host. This means the container uses the host's network stack and can directly access the
host's network interfaces. Containers in host network mode have the same network configuration as
the host system and can use the host's IP address.
User-defined Network: Docker allows you to create user-defined networks to isolate containers and
control their connectivity. User-defined networks provide network segmentation and firewall-like
rules to control communication between containers. Containers can be attached to multiple user-
defined networks, allowing for more complex network setups.
These are some fundamental concepts related to Docker networking. Understanding Docker
networking allows you to configure communication between containers, connect containers to
external networks, and build complex network setups for your containerized applications.
Docker Compose
Docker Compose is a tool that allows you to define and manage multi-container Docker
applications. It provides a simple way to define the services, networks, and volumes required for
your application using a YAML file. With Docker Compose, you can easily spin up and tear down
complex application stacks with just a few commands.
Here are the key features and concepts related to Docker Compose:
Compose File: A Compose file, usually named docker-compose.yml, is used to define the services,
networks, and volumes for your application. It is written in YAML format and specifies the
configuration options for each service in your application stack.
Volumes: Volumes in Docker Compose allow you to persist data generated by your containers.
Volumes can be created and attached to services, ensuring that data is stored outside of the
container's filesystem. This enables data to be retained even if containers are destroyed or recreated.
Environment Variables: Docker Compose allows you to specify environment variables for your
services. Environment variables can be used to configure your application's behavior, pass secrets,
or provide runtime parameters. Environment variables can be set directly in the Compose file or
loaded from external files.
Building Images: Docker Compose supports building custom images for your services using
Mr.M.Vengateshwaran M.E., (Ph.D) Asst.Prof/CSE Cloud Computing
Dockerfiles. You can specify a build context and a Dockerfile for a service, and Compose will build
the image before starting the container.
Service Dependencies: Docker Compose allows you to define dependencies between services. You
can specify that one service depends on another, and Compose will start the services in the correct
order, ensuring that dependencies are resolved before a service is started.
Scaling Services: Docker Compose makes it easy to scale services horizontally. You can define the
desired number of replicas for a service, and Compose will create and manage the specified number
of containers.
The development workflow in AWS (Amazon Web Services) typically involves several stages and
tools to facilitate the development, testing, and deployment of applications.
Environment Setup: Set up your development environment by installing the necessary tools and
SDKs provided by AWS. This may include the AWS CLI (Command Line Interface), AWS SDKs
for your programming language, and any additional development tools or IDEs.
Code Development: Write your application code using your preferred programming language and
development environment. This may include writing serverless functions, building web applications,
or developing microservices.
Version Control: Use a version control system like Git to manage your codebase. Create a Git
repository and commit your code changes regularly. It's a best practice to use branches and pull
requests to manage feature development, bug fixes, and code reviews.
Infrastructure as Code: Use infrastructure-as-code (IaC) tools like AWS CloudFormation or AWS
CDK (Cloud Development Kit) to define your application's infrastructure in code. This allows you
to provision and manage AWS resources, such as EC2 instances, databases, load balancers, and
security groups, using version-controlled templates or scripts.
Testing and Quality Assurance: Implement a testing strategy that includes unit tests, integration
Mr.M.Vengateshwaran M.E., (Ph.D) Asst.Prof/CSE Cloud Computing
tests, and end-to-end tests. Use AWS testing services like AWS CodeBuild, AWS Device Farm, or
AWS Lambda for automated testing. Additionally, consider implementing code review processes
and code analysis tools to ensure code quality and adherence to best practices.
Deployment and Staging: Deploy your application to staging environments for further testing and
validation. Use AWS services like AWS Elastic Beanstalk, AWS App Runner, or AWS ECS
(Elastic Container Service) to deploy and manage your applications. You can also leverage services
like AWS CloudFront or Amazon S3 for content delivery and hosting static assets.
Monitoring and Logging: Implement monitoring and logging solutions to gain insights into the
health, performance, and behavior of your application. Use AWS services like AWS CloudWatch,
AWS X-Ray, or AWS CloudTrail for monitoring, logging, and tracing application activities.
Configure alarms and notifications to proactively detect and respond to issues.
Scaling and Optimization: Monitor the performance of your application and optimize it for
scalability and cost efficiency. Utilize AWS Auto Scaling to automatically adjust resource capacity
based on demand. Analyze metrics and logs to identify performance bottlenecks and optimize the
application's infrastructure and code accordingly.
Security and Compliance: Implement security measures to protect your application and data.
Follow AWS security best practices and leverage AWS services like AWS Identity and Access
Management (IAM), AWS Secrets Manager, AWS Key Management Service (KMS), and AWS
Certificate Manager to manage access, secrets, encryption, and SSL/TLS certificates.
This is a high-level overview of a typical development workflow in AWS. The specific tools and
services you use may vary based on your application requirements and development preferences.
AWS offers a wide range of services and features to support various development methodologies,
deployment models, and scalability needs.
Amazon EC Services
AWS (Amazon Web Services) provides a comprehensive suite of EC (Elastic Compute) services
that offer scalable and flexible compute resources for running applications and workloads.
2. Amazon EC2 Auto Scaling: EC2 Auto Scaling helps you maintain the desired number of EC2
instances in an EC2 Auto Scaling group automatically. It automatically scales the number of
instances based on predefined scaling policies, ensuring that your application can handle varying
levels of traffic and demand. EC2 Auto Scaling also integrates with other AWS services, such as
Amazon CloudWatch, to provide dynamic scaling capabilities.
3. AWS Lambda: AWS Lambda is a serverless compute service that allows you to run code
without provisioning or managing servers. You can write your code in various programming
languages and upload it to Lambda, which then handles the underlying infrastructure and
automatically scales your code in response to incoming requests. Lambda is commonly used for
executing short-lived functions and building event-driven architectures.
4. Amazon Elastic Container Service (ECS): ECS is a fully managed container orchestration
service that allows you to run Docker containers in the cloud. It provides a scalable and secure
platform for deploying, managing, and scaling containerized applications. ECS supports both
EC2 launch type, where containers run on EC2 instances, and Fargate launch type, which allows
you to run containers without managing the underlying infrastructure.
5. Amazon Elastic Kubernetes Service (EKS): EKS is a fully managed Kubernetes service
provided by AWS. It simplifies the deployment, management, and scaling of Kubernetes
clusters. With EKS, you can run containerized applications using Kubernetes and leverage the
rich ecosystem of Kubernetes tools and features. EKS integrates with other AWS services and
offers native integration with AWS Fargate for serverless container deployments.
6. AWS Batch: AWS Batch is a fully managed service for running batch computing workloads. It
allows you to execute jobs on EC2 instances and automatically provisions the necessary
resources based on your job's requirements. AWS Batch provides a managed environment for
scheduling, monitoring, and scaling batch jobs, making it suitable for a wide range of use cases,
including data processing, scientific simulations, and analytics.
7. AWS Outposts: AWS Outposts brings AWS infrastructure and services to your on-premises
data center or edge location. It extends the capabilities of AWS services, including EC2, EKS,
and ECS, to run locally on Outposts hardware. This allows you to leverage AWS services and
manage your on-premises and cloud workloads consistently.
These are just a few examples of the EC services offered by AWS. Each service provides specific
capabilities and features to cater to different use cases and workload requirements. AWS EC
services offer the flexibility, scalability, and reliability needed to build and run a wide range of
applications in the cloud.
Introduction, Test Driven Development, Continuous Integration, Code coverage, Best Practices,
Virtual Machines vs Containers, Rolling Deployments, Continuous Deployment, Auto Scaling.
Case Study: Open Stack, Cloud based ML Solutions in Healthcare.
Introduction
DevOps, short for Development and Operations, is a collaborative approach to software
development that emphasizes communication, collaboration, and integration between software
developers and IT operations teams.
It aims to improve the efficiency and quality of software delivery by breaking down silos and
fostering a culture of shared responsibility and continuous improvement.
In traditional software development processes, development and operations teams often work in
isolation, leading to issues such as slow and error-prone deployments, lack of visibility, and
frequent miscommunication.
DevOps aims to address these challenges by promoting cross-functional collaboration and
automation.
2. Continuous Integration and Continuous Delivery (CI/CD): DevOps promotes the use of
automated tools and practices for integrating code changes frequently and delivering software in
small, incremental releases. CI/CD pipelines automate the building, testing, and deployment
processes, allowing teams to deliver software faster and with higher quality.
4. Automation: Automation plays a crucial role in DevOps. By automating repetitive and manual
tasks, teams can reduce errors, improve efficiency, and focus on higher-value activities.
Automation can include tasks like testing, deployment, monitoring, and infrastructure
provisioning.
5. Monitoring and Feedback: DevOps emphasizes the importance of monitoring software and
infrastructure in production to gain insights into performance, reliability, and user experience.
Monitoring helps identify issues and provides feedback to guide improvements in future
development cycles.
Adopting DevOps practices can result in benefits such as faster time to market, improved software
quality, increased collaboration, better resource utilization, and more reliable and resilient systems.
However, implementing DevOps requires organizational buy-in, cultural shifts, and investment in
appropriate tools and training to be successful.
Why DevOps?
Before going further, we need to understand why we need the DevOps over the other methods.
DevOps History
In 2009, the first conference named DevOpsdays was held in Ghent Belgium. Belgian consultant
and Patrick Debois founded the conference.
In 2012, the state of DevOps report was launched and conceived by Alanna Brown at Puppet.
In 2014, the annual State of DevOps report was published by Nicole Forsgren, Jez Humble,
Gene Kim, and others. They found DevOps adoption was accelerating in 2014 also.
In 2015, Nicole Forsgren, Gene Kim, and Jez Humble founded DORA (DevOps Research and
Assignment).
In 2017, Nicole Forsgren, Gene Kim, and Jez Humble published "Accelerate: Building and
Scaling High Performing Technology Organizations".
Features of Devops
1) Continuous Development
This phase involves the planning and coding of the software. The vision of the project is
decided during the planning phase. And the developers begin developing the code for the
application. There are no DevOps tools that are required for planning, but there are several tools
for maintaining the code.
2) Continuous Integration
This stage is the heart of the entire DevOps lifecycle. It is a software development practice in
which the developers require to commit changes to the source code more frequently. This may
be on a daily or weekly basis. Then every commit is built, and this allows early detection of
problems if they are present. Building code is not only involved compilation, but it also includes
Mr.M.Vengateshwaran M.E., (Ph.D) Asst.Prof/CSE Cloud Computing
unit testing, integration testing, code review, and packaging.
The code supporting new functionality is continuously integrated with the existing code.
Therefore, there is continuous development of software. The updated code needs to be
integrated continuously and smoothly with the systems to reflect changes to the end-
users.
Jenkins is a popular tool used in this phase. Whenever there is a change in the Git
repository, then Jenkins fetches the updated code and prepares a build of that code,
which is an executable file in the form of war or jar. Then this build is forwarded to the
test server or the production server.
3. Continuous Testing
This phase, where the developed software is continuously testing for bugs. For constant testing,
automation testing tools such as TestNG, JUnit, Selenium, etc are used. These tools allow QAs
to test multiple code-bases thoroughly in parallel to ensure that there is no flaw in the
functionality. In this phase, Docker Containers can be used for simulating the test environment.
Selenium does the automation testing, and TestNG generates the reports. This entire testing
phase can automate with the help of a Continuous Integration tool called Jenkins.
Automation testing saves a lot of time and effort for executing the tests instead of doing this
manually. Apart from that, report generation is a big plus. The task of evaluating the test cases
that failed in a test suite gets simpler. Also, we can schedule the execution of the test cases at
predefined times. After testing, the code is continuously integrated with the existing code.
It may occur in the form of documentation files or maybe produce large-scale data about the
application parameters when it is in a continuous use position. The system errors such as server
not reachable, low memory, etc are resolved in this phase. It maintains the security and
availability of the service.
5) Continuous Feedback
The application development is consistently improved by analyzing the results from the
operations of the software. This is carried out by placing the critical phase of constant feedback
between the operations and the development of the next version of the current software
application.
The continuity is the essential factor in the DevOps as it removes the unnecessary steps which
are required to take a software application from development, using it to find out its issues and
then producing a better version. It kills the efficiency that may be possible with the app and
reduce the number of interested customers.
6) Continuous Deployment
In this phase, the code is deployed to the production servers. Also, it is essential to ensure that
the code is correctly used on all the servers.
The new code is deployed continuously, and configuration management tools play an
essential role in executing tasks frequently and quickly.
Here are some popular tools which are used in this phase, such as Chef, Puppet, Ansible,
and SaltStack.
Containerization tools are also playing an essential role in the deployment phase.
Vagrant and Docker are popular tools that are used for this purpose. These tools help to
produce consistency across development, staging, testing, and production environment.
They also help in scaling up and scaling down instances softly.
Containerization tools help to maintain consistency across the environments where the
application is tested, developed, and deployed. There is no chance of errors or failure in
the production environment as they package and replicate the same dependencies and
packages used in the testing, development, and staging environment. It makes the
application easy to run on different computers.
It is clear from the discussion that continuity is the critical factor in the DevOps in
removing steps that often distract the development, take it longer to detect issues and
produce a better version of the product after several months.
With DevOps, we can make any software product more efficient and increase the overall
count of interested customers in your product.
Devops workflow:
Devops Tools:
1. Write a Test: In TDD, you start by writing a test that defines the desired behavior of a small
piece of functionality. This test is initially expected to fail since the corresponding code hasn't
been implemented yet.
2. Run the Test: The next step is to run the test and observe it fail. This step verifies that the test
is correctly detecting the absence of the desired functionality.
3. Write the Code: Now, you implement the minimum amount of code necessary to make the test
pass. The focus is on writing the simplest and most straightforward solution to fulfill the test's
requirements.
4. Run the Test Again: After writing the code, you rerun the test suite to verify that the new test
you wrote passes. At this point, you should have at least one passing test.
5. Refactor the Code: With passing tests, you can refactor the code to improve its design without
changing its behavior. This step ensures that the code remains clean, maintainable, and adheres
to best practices.
6. Repeat: The process is repeated for the next small piece of functionality. Each new test exposes
requirements for the code that you incrementally implement until all the desired functionality is
fulfilled.
Better Code Quality: TDD encourages developers to write code that is testable, modular, and loosely
coupled. The focus on writing tests first helps catch bugs early in the development process, resulting in
higher code quality.
Increased Confidence: By having comprehensive tests, developers gain confidence in making changes
to the codebase. Tests act as safety nets, allowing developers to refactor or add new features without
fear of breaking existing functionality.
Improved Design: TDD promotes good design principles such as separation of concerns and single
responsibility. By refactoring the code after each test passes, developers can continuously improve the
design, making it more maintainable and extensible.
Faster Debugging: When a test fails, it provides a clear indication of which specific functionality is
not working as expected. This speeds up the debugging process and helps pinpoint issues more
accurately.
Documentation and Specification: The tests act as living documentation of the codebase. They
provide examples of how the code should behave and serve as executable specifications for future
development and maintenance.
It's important to note that TDD is not a silver bullet and may not be suitable for all scenarios. Its
effectiveness can vary depending on the nature of the project, team dynamics, and other factors.
However, when used appropriately, TDD can be a valuable practice in software development, leading
to more robust and reliable code.
Continuous Integration
Continuous Integration (CI) is a software development practice that involves frequently
integrating code changes from multiple developers into a shared repository.
The main goal of CI is to detect integration issues and conflicts early by automatically building
and testing the software with each code commit.
In a CI workflow, developers integrate their code changes into a central version control system,
such as Git, multiple times throughout the day.
This triggers an automated build process, where the code is compiled, dependencies are
resolved, and the application or software is built. Following the build, a suite of automated tests
is executed to ensure that the integrated changes haven't introduced any regressions or defects.
Automated Build and Test: CI relies on automated tools and scripts to build the software from source
code and run a suite of tests. This automation enables fast and reliable feedback on the health and
stability of the integrated code.
Early Detection of Issues: By integrating code frequently and running tests automatically, CI allows
for the early detection of integration issues, conflicts, and bugs. This helps identify and fix problems
when they are smaller and easier to address.
Rapid Feedback Loop: CI provides developers with rapid feedback on the status of their code
changes. They receive immediate notifications if their changes break the build or fail any tests. This
short feedback loop enables quick identification and resolution of issues.
Code Quality and Stability: Continuous Integration helps maintain high code quality and stability. It
ensures that all code changes are built and tested in a consistent and repeatable manner, reducing the
risk of deploying faulty or unstable code to production.
To implement Continuous Integration effectively, teams typically leverage CI/CD tools such as
Jenkins, Travis CI, CircleCI, or GitLab CI/CD. These tools automate the build and test processes,
provide reporting and notifications, and integrate with version control systems.
It's worth noting that while Continuous Integration focuses on code integration and automated testing,
Continuous Delivery (CD) and Continuous Deployment (CD) expand on CI by automating the entire
software delivery pipeline, including deployment to production environments. Together, CI/CD
practices contribute to a more streamlined and efficient software development process.
Automating builds
Automating testing
A single source code repository
Visibility of the entire process
Real-time code access to everyone in the team
Code coverage is typically measured by running tests against the software and collecting
information on which parts of the code were executed during the test run. The collected data is then
used to calculate the coverage metrics.
1. Line Coverage: Line coverage measures the percentage of lines of code that are executed
during the test run. It indicates whether each line of code has been executed or not.
2. Branch Coverage: Branch coverage measures the percentage of branches or decision points
in the code that are executed during the test run. It checks if both the true and false branches
of each decision point have been executed.
Code coverage helps assess the quality and thoroughness of the test suite. Higher code coverage
indicates that a larger portion of the code has been tested, potentially leading to the detection of
more bugs and ensuring that the code is exercised in various scenarios.
However, code coverage alone does not guarantee the absence of bugs or comprehensive
testing. It is possible to have high code coverage but still miss critical scenarios or edge cases.
Code coverage should be used as a tool to guide testing efforts, improve the effectiveness of test
suites, and identify areas that require more attention.
To measure code coverage, various tools and frameworks exist, such as JaCoCo, Cobertura,
Istanbul, and OpenCover, which provide insights into the coverage metrics of the codebase.
These tools can be integrated into the build process or test automation framework to generate
reports on code coverage.
A code coverage tool works with a specific programming language. Apart from that,
they can be integrated with:
Best practices
DevOps encompasses a wide range of practices aimed at improving collaboration, efficiency, and
automation in software development and operations.
Continuous Integration and Continuous Delivery (CI/CD): Implement automated CI/CD pipelines
to enable frequent and reliable software releases. Automate the build, test, and deployment processes to
ensure that code changes are integrated and delivered smoothly.
Infrastructure as Code (IaC): Use infrastructure as code principles and tools, such as Terraform or
CloudFormation, to define and manage infrastructure resources. This allows for reproducibility,
scalability, and version control of infrastructure deployments.
Automated Testing: Implement automated testing practices at different levels, including unit tests,
integration tests, and system tests. Automated testing helps catch defects early, ensure code quality, and
improve overall software reliability.
Security and Compliance: Bake security and compliance practices into your DevOps processes.
Conduct regular security assessments, follow secure coding practices, and enforce compliance
requirements from the early stages of development.
Resilience and Disaster Recovery: Design systems with resilience in mind, implementing redundancy,
fault tolerance, and disaster recovery mechanisms. Conduct regular drills and testing to ensure
readiness for potential failures and minimize downtime.
1. Virtual Machines:
Isolation: VMs provide strong isolation between applications and the host operating system. Each VM
has its own complete operating system and runs on a hypervisor that manages the hardware resources.
Resource Requirements: VMs are resource-intensive because each VM requires a separate operating
system and has its own memory, disk space, and CPU allocation. This can lead to higher resource
overhead and slower startup times.
Portability: VMs can be portable across different hypervisors and cloud platforms, allowing
applications to be migrated between environments. However, some level of configuration and
2. Containers:
Isolation: Containers provide lightweight application isolation by sharing the host operating system
kernel. Each container runs in its own isolated user space, but shares the host OS, which reduces
resource overhead and improves performance.
Resource Requirements: Containers have lower resource overhead compared to VMs because they
share the host operating system. Multiple containers can run on the same host with efficient resource
utilization.
Portability: Containers are highly portable and can run consistently across different environments,
including development, testing, and production. They provide application consistency and eliminate
potential environment-related issues.
Management: Container management platforms, such as Docker and Kubernetes, simplify the
management of containers, including deployment, scaling, orchestration, and automated lifecycle
management.
Scalability: Containers are designed for scalability and can be easily scaled horizontally by adding
more containers to handle increased workload. Container orchestration platforms provide dynamic
scaling based on demand.
Use Cases: Containers are well-suited for microservices architectures, modernizing applications, and
deploying cloud-native applications. They are used extensively in DevOps practices for building,
testing, and deploying applications in a fast and efficient manner.
Rolling Deployment
Rolling deployments, also known as rolling updates or rolling upgrades, are a deployment strategy used
in software development and operations to minimize downtime and ensure seamless updates of
applications or services. The rolling deployment approach involves gradually updating instances of an
application or service in a controlled manner while keeping the application available and responsive to
users. Here's how rolling deployments work:
5. Gradual Rollout: The update process continues in a gradual and iterative manner until all
instances have been updated. This approach allows for monitoring the impact of the update and
quickly addressing any issues or regressions that may arise.
2. Faster Rollback: If any issues are identified during the update, rolling deployments make it
easier to roll back to the previous version since only a subset of instances are updated at a time.
3. Risk Mitigation: Updating instances incrementally reduces the risk of widespread failures or
issues, as issues are isolated to the updated subset of instances, making it easier to identify and
address problems.
1. Compatibility: Ensure backward compatibility between the existing and updated versions to
avoid compatibility issues or data inconsistencies during the rolling deployment process.
2. Monitoring and Health Checks: Implement robust monitoring and health check mechanisms
to detect issues promptly and validate the health of each updated instance.
4. Rollback Plan: Have a well-defined rollback plan in case issues arise during the deployment.
This includes having backups, restoring previous versions, and communicating the rollback
process to stakeholders.
Rolling deployments are commonly used in DevOps practices and are facilitated by containerization
technologies, orchestration platforms like Kubernetes, and continuous integration and deployment
(CI/CD) pipelines. They provide a way to update applications and services seamlessly, reduce
downtime, and improve overall deployment reliability.
Continuous Deployment is a software development practice where code changes are automatically
deployed to production environments as soon as they pass the necessary automated tests and quality
checks. It is an extension of continuous integration and continuous delivery (CI/CD) practices, enabling
a rapid and frequent release cycle.
1. Automated Build and Test: Continuous Deployment relies on robust automation for building,
testing, and validating code changes. Automated processes ensure that code changes are
thoroughly tested to meet quality standards before deployment.
2. Integration with Version Control: Continuous Deployment is typically integrated with version
control systems, such as Git, to trigger deployment pipelines automatically when new code
changes are pushed or merged into the main branch.
3. Automated Deployment Pipelines: Deployment pipelines are set up to automate the entire
process from code commit to production deployment. These pipelines include stages for
building, testing, packaging, and deploying the application.
4. Continuous Testing: Continuous Deployment relies heavily on automated testing. Test suites,
including unit tests, integration tests, and end-to-end tests, are executed as part of the
deployment pipeline to ensure the quality and stability of the application.
6. Monitoring and Feedback Loops: Continuous Deployment requires robust monitoring and
feedback mechanisms. Monitoring tools and real-time metrics help detect issues or anomalies in
the production environment, providing feedback to the development team for quick response
and resolution.
3. Early Issue Detection: By continuously testing and validating code changes, issues or bugs can
be identified earlier in the development cycle, reducing the likelihood of major failures in
production.
4. Rapid Iteration and Innovation: Continuous Deployment fosters a culture of rapid iteration
and innovation. Developers can quickly receive user feedback and iterate on features,
incorporating improvements and new functionality in subsequent deployments.
1. Test Coverage: Robust test suites are crucial for ensuring code stability and quality.
Comprehensive unit tests, integration tests, and end-to-end tests should be part of the automated
testing strategy.
2. Deployment Rollbacks: Continuous Deployment requires a well-defined rollback strategy in
case issues arise after a deployment. The ability to roll back to a previous known-good version
quickly is important to minimize any potential negative impact.
3. Monitoring and Alerting: Implementing effective monitoring and alerting systems is essential
to detect and respond to issues promptly. Real-time monitoring helps identify problems early
and trigger appropriate actions for resolution.
Continuous Deployment is a practice that requires careful planning, automated processes, and strong
collaboration between development, operations, and quality assurance teams. It empowers
organizations to release software faster, respond to user feedback quickly, and continuously deliver
value to end-users.
Auto scaling
Auto Scaling is a key component of DevOps and cloud computing that enables automatic adjustment of
computing resources based on real-time demand. It allows applications to scale up or down
dynamically in response to changes in workload, ensuring optimal performance and cost efficiency.
1. Scaling Policies: Auto Scaling uses predefined scaling policies to determine when and how to
scale resources. These policies define thresholds or metrics, such as CPU utilization or network
traffic, that trigger scaling actions.
2. Elasticity: Auto Scaling enables the automatic addition or removal of resources based on
demand. When the workload increases, additional instances or resources are provisioned to
handle the increased load. Conversely, when the demand decreases, excess resources are
automatically terminated or scaled down to save costs.
3. Load Balancing: Auto Scaling works in conjunction with load balancing mechanisms. As new
instances are added, load balancers distribute incoming traffic evenly across the instances,
ensuring efficient utilization of resources and high availability.
4. Monitoring and Metrics: Auto Scaling relies on real-time monitoring and metrics to make
scaling decisions. Metrics such as CPU utilization, network traffic, or application-specific
metrics are continuously monitored to determine when to scale up or down.
5. Integration with Orchestration: Auto Scaling is often integrated with container orchestration
platforms like Kubernetes or infrastructure management tools like AWS Auto Scaling. These
platforms provide automation capabilities and facilitate efficient scaling of resources.
6. Fault Tolerance: Auto Scaling enhances fault tolerance by ensuring that sufficient resources
are available to handle increased load or compensate for failed instances. It helps maintain high
availability and resilience of applications.
1. Monitoring and Alerting: Robust monitoring and alerting systems are essential to detect
changes in workload and trigger scaling actions promptly. Real-time metrics and proactive
monitoring help ensure timely resource adjustments.
2. Resource Constraints: Auto Scaling requires careful consideration of resource limits, such as
available compute resources, storage capacity, and network bandwidth. It is important to ensure
that sufficient resources are available to accommodate scaling demands.
3. Application Architecture: The application architecture should be designed to support Auto
Scaling. Applications should be stateless or capable of horizontal scaling, allowing new
instances to be added or removed seamlessly without impacting data consistency or user
experience.
4. Testing and Validation: Regular testing and validation of Auto Scaling configurations and
policies are crucial to ensure they function as expected. Load testing and performance testing
help identify any bottlenecks or limitations in the scaling capabilities.
Auto Scaling is a powerful tool in the DevOps toolbox that enables applications to scale dynamically
and efficiently based on workload fluctuations. It provides agility, cost optimization, and high
availability, allowing organizations to meet user demands effectively while optimizing resource
utilization.
Mr.M.Vengateshwaran M.E., (Ph.D) Asst.Prof/CSE Cloud Computing
Openstack
OpenStack is an open-source cloud computing platform that provides infrastructure as a service (IaaS)
capabilities. It enables the creation and management of private and public clouds by providing a set of
software tools and components for building and managing cloud infrastructure.
2. Scalability and Flexibility: OpenStack is designed to be highly scalable and flexible, allowing
users to add and manage a large number of compute, storage, and networking resources. It can
be used to build private clouds within an organization's own data centers or public clouds for
providing services to external users.
3. Open Source Community: OpenStack is an open-source project with a large and active
community of contributors and users. The community-driven development model ensures
regular updates, bug fixes, and the addition of new features. It also promotes interoperability
and standards compliance across different OpenStack deployments.
4. APIs and Interoperability: OpenStack provides a rich set of APIs that allow users to interact
with and automate the management of cloud resources. These APIs enable integration with
other systems and tools, making it possible to build custom applications or leverage existing
cloud management tools.
6. Use Cases: OpenStack is used by organizations of various sizes and across different industries.
It is particularly popular in industries such as telecommunications, academia, research, and
media, where there is a need for scalable, on-demand infrastructure. OpenStack can be used to
build private clouds for internal use or public clouds for offering cloud services to external
customers.
7. Ecosystem and Vendor Support: OpenStack has a wide ecosystem of vendors and service
providers offering commercial distributions, support, and professional services around
OpenStack deployments. This allows organizations to leverage expertise and get assistance in
implementing and managing OpenStack-based cloud infrastructure.
How does OpenStack Work?
Basically, OpenStack is a series of commands which is called scripts. And these scripts are
packed into packages, which are called projects that rely on tasks that create cloud
environments. OpenStack relies on two other forms of software in order to construct certain
environments:
Virtualization means a layer of virtual resources basically abstracted from the hardware.
A base OS that executes commands basically provided by OpenStack Scripts.
So, we can say all three technologies, i.e., virtualization, base operating system, and OpenStack
must work together.
Installation of OpenStack
Step 1: Update Ubuntu System
Step 2: Create Stack Use
Step 3: Install the Git
Step 4: Download OpenStack
Step 5: Create a DevStack Configuration File
Step 6 : Install OpenStack with DevStack
Step 7: Accessing OpenStack on a browser
Step 8: Create an Instance
Highlights of OpenStack
OpenStack has made it possible for companies such as Bloomberg and Disney to handle their
private clouds at very manageable prices.
OpenStack offers mixed hypervisor environments and bare metal server environments.
RedHat, SUSE Linux, and Debian have all been active contributors and have been supporting
OpenStack since its inception.
Cloud-based machine learning (ML) solutions in healthcare have gained significant traction in recent
years, offering numerous benefits such as scalability, accessibility, cost-efficiency, and collaboration
opportunities. Here are some common applications of cloud-based ML solutions in the healthcare
industry:
4. Remote Patient Monitoring: Cloud-based ML solutions enable the analysis of data from
remote patient monitoring devices, wearables, and sensors. ML algorithms can detect patterns,
identify deviations, and provide real-time insights for proactive healthcare interventions,
especially for chronic disease management.
5. Drug Discovery and Personalized Medicine: Cloud-based ML platforms can analyze large-
scale genomic, proteomic, and metabolomic data to aid in drug discovery and development. ML
models can also facilitate precision medicine by predicting treatment responses based on an
individual's genetic profile, clinical data, and other relevant factors.
6. Health Chatbots and Virtual Assistants: Cloud-based ML-powered chatbots and virtual
assistants can provide personalized health information, answer queries, offer symptom
assessment, and provide recommendations for self-care. These solutions can enhance patient
engagement, provide 24/7 support, and triage healthcare resources effectively.
7. Healthcare Data Security and Privacy: Cloud-based ML solutions offer robust security
measures and compliance frameworks to protect sensitive patient data. Advanced encryption,
access controls, and data anonymization techniques help ensure data privacy and compliance
with regulations like HIPAA (Health Insurance Portability and Accountability Act).
Cloud providers like Amazon Web Services (AWS), Microsoft Azure, and Google Cloud Platform
offer a range of cloud-based ML services, tools, and infrastructure that healthcare organizations can
leverage. These services provide pre-built ML models, data storage and processing capabilities, and
scalable computing resources to support ML workflows in healthcare.
However, it's important to consider data governance, regulatory compliance, and ethical implications
when implementing cloud-based ML solutions in healthcare. Adhering to data protection regulations,
Mr.M.Vengateshwaran M.E., (Ph.D) Asst.Prof/CSE Cloud Computing
ensuring data privacy, and maintaining transparency in ML algorithms are critical for maintaining trust
and ethical standards in healthcare applications.
Benefits: