CCD Chap Notes

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 43

Cloud computing:

Chapter 1: Cloud computing Fundamentals:

Definition of cloud computing:


According to the definition given by Armbrust

1) Cloud computing refers to both the applications delivered as services over the Internet and the
hardware and system software in the datacenters that provide those services.

2) According to the definition proposed by the U.S. National Institute of Standards and Technology
(NIST):
Cloud computing is a model for enabling ubiquitous, convenient, on-demand network access to a
shared pool of configurable computing resources (e.g., networks, servers, storage, applications,
and services) that can be rapidly provisioned and released with minimal management effort or
service provider interaction.

3) RajKumar Buyya defined cloud computing based on the nature of utility computing
A cloud is a type of parallel and distributed system consisting of a collection of interconnected and
virtualized computers that are dynamically provisioned and presented as one or more unified
computing resources based on service-level agreements established through negotiation between
the service provider and consumers.

Q) What is evolution of cloud computing


Distributed Systems:
It is a composition of multiple independent systems but all of them are depicted as a single entity to the users. The
purpose of distributed systems is to share resources and also use them effectively and efficiently. Distributed
systems possess characteristics such as scalability, concurrency, continuous availability, heterogeneity, and
independence in failures. But the main problem with this system was that all the systems were required to be
present at the same geographical location.

Mainframe computing:
Mainframes which first came into existence in 1951 are highly powerful and reliable computing machines. These
are responsible for handling large data such as massive input-output operations. Even today these are used for
bulk processing tasks such as online transactions etc. These systems have almost no downtime with high fault
tolerance. After distributed computing, these increased the processing capabilities of the system.

Cluster computing:
In 1980s, cluster computing came as an alternative to mainframe computing. Each machine in the cluster was
connected to each other by a network with high bandwidth. These were way cheaper than those mainframe
systems. These were equally capable of high computations. Also, new nodes could easily be added to the cluster if
it was required.

Grid computing:
In 1990s, the concept of grid computing was introduced. It means that different systems were placed at entirely
different geographical locations and these all were connected via the internet. These systems belonged to different
organizations and thus the grid consisted of heterogeneous nodes. Although it solved some problems but new
problems emerged as the distance between the nodes increased.

Virtualization:
It was introduced nearly 40 years back. It refers to the process of creating a virtual layer over the hardware which
allows the user to run multiple instances simultaneously on the hardware. It is a key technology used in cloud
computing. It is the base on which major cloud computing services such as Amazon EC2, VMware vCloud, etc
work on.

Web 2.0:
It is the interface through which the cloud computing services interact with the clients. It is because of Web 2.0
that we have interactive and dynamic web pages. It also increases flexibility among web pages. Popular examples
of web 2.0 include Google Maps, Facebook,

Service orientation:
It acts as a reference model for cloud computing. It supports low-cost, flexible, and evolvable applications. Two
important concepts were introduced in this computing model.

Utility computing:
It is a computing model that defines service provisioning techniques for services such as compute
services along with other major services such as storage, infrastructure, etc which are provisioned
on a pay-per-use basis.

1. What is virtualisation ?
Virtualization is technology that you can use to create virtual representations of servers, storage, networks, and
other physical machines. Virtual software mimics the functions of physical hardware to run multiple virtual
machines simultaneously on a single physical machine.
one of the main cost-effective, hardware-reducing, and energy-saving techniques used by cloud providers is
Virtualization. Virtualization allows sharing of a single physical instance of a resource or an application among
multiple customers and organizations at one time.
Hypervisor is software that creates and runs virtual machines
The term virtualization is often synonymous with hardware virtualization, which plays a fundamental role in
efficiently delivering Infrastructure-as-a-Service (IaaS) solutions for cloud computing. Moreover, virtualization
technologies provide a virtual environment for not only executing applications but also for storage, memory, and
networking.

Advantages of Virtualization
More flexible and efficient allocation of resources.
Enhance development productivity.
It lowers the cost of IT infrastructure.
Remote access and rapid scalability.
High availability and disaster recovery.
Pay peruse of the IT infrastructure on demand.
Enables running multiple operating systems.
all vm will work independently

Virtualization involves the creation of something's virtual platform, including virtual computer
hardware, virtual storage devices and virtual computer networks.

Software called hypervisor is used for hardware virtualization. With the help of a virtual machine
hypervisor, software is incorporated into the server hardware component. The role of hypervisor is
to control the physical hardware that is shared between the client and the provider. Hardware
virtualization can be done using the Virtual Machine Monitor (VVM) to remove physical
hardware. There are several extensions to the processes which help to speed up virtualization
activities and increase hypervisor performance. When this virtualization is done for the server
platform, it is called server socialization.
Hypervisor creates an abstract layer from the software to the hardware in use. After a hypervisor
is installed, virtual representations such as virtual processors take place. After installation, we
cannot use physical processors. There are several popular hypervisors including ESXi-based
VMware vSphere and Hyper-V.
FIGURE 1.14 Hardware Virtualization

Instances in virtual machines are typically represented by one or more data, which can be easily
transported in physical structures. In addition, they are also autonomous since they do not have
other dependencies for their use other than the virtual machine manager.

A Process virtual machine, sometimes known as an application virtual machine, runs inside a host
OS as a common application, supporting a single process. It is created at the beginning and at the
end of the process. Its aim is to provide a platform-independent programming environment which
abstracts the information about the hardware or operating system underlying the program and
allows it to run on any platform in the same way. For example, Linux wine software helps you run
Windows.

A high level abstraction of a VM process is the high level programming language (compared with
the low-level ISA abstraction of the VM system). Process VMs are implemented by means of an
interpreter; just-in-time compilation achieves performance comparable to compiled programming
languages.

The Java programming language introduced with the Java virtual machine has become popular
with this form of VM. The .NET System, which runs on a VM called the Common Language
Runtime, is another example.
FIGURE 1.15 process virtual machine design
Reference from “Mastering Cloud Computing Foundations and Applications Programming”
by Rajkumar Buyya)

Q) Properties and characteristics of cloud computing?


1.0.1 Characteristics and benefits
As both commercially and technologically mature cloud computing services, companies will be
easier to maximize their potential benefits. However, it is equally important to know what cloud
computing is and what it does.

FIGURE 1. 7 Features of Cloud Computing


Following are the characteristics of Cloud Computing:
1. Resources Pooling
This means that the Cloud provider used a multi-leaner model to deliver the
computing resources to various customers. There are various allocated and reassigned
physical and virtual resources, which rely on customer demand. In general, the customer
Cloud Computing: Unedited Version pg. 10
has no control or information about the location of the resources provided, but can choose
location on a higher level of abstraction.

2. On-Demand Self-Service
This is one of the main and useful advantages of Cloud Computing as the user can track
server uptimes, capability and network storage on an ongoing basis. The user can also
monitor computing functionalities with this feature.
3. Easy Maintenance
The servers are managed easily and the downtime is small and there are no downtime
except in some cases. Cloud Computing offers an update every time that increasingly
enhances it. The updates are more system friendly and operate with patched bugs faster than
the older ones.
4. Large Network Access
The user may use a device and an Internet connection to access the cloud data or upload it
to the cloud from anywhere. Such capabilities can be accessed across the network and
through the internet.
5. Availability
The cloud capabilities can be changed and expanded according to the usage. This review
helps the consumer to buy additional cloud storage for a very small price, if necessary.
6. Automatic System
Cloud computing analyzes the data required automatically and supports a certain service
level of measuring capabilities. It is possible to track, manage and report the usage. It
provides both the host and the customer with accountability.
7. Economical
It is a one-off investment since the company (host) is required to buy the storage, which can
be made available to many companies, which save the host from monthly or annual costs.
Only the amount spent on the basic maintenance and some additional costs are much
smaller.
8. Security
Cloud Security is one of cloud computing's best features. It provides a snapshot of the data
stored so that even if one of the servers is damaged, the data cannot get lost. The
information is stored on the storage devices, which no other person can hack or use. The
service of storage is fast and reliable.
9. Pay as you go
Users only have to pay for the service or the space in cloud computing. No hidden or
additional charge to be paid is liable to pay. The service is economical and space is often
allocated free of charge.
10. Measured Service
Cloud Computing resources that the company uses to monitor and record. This use of
resources is analyzed by charge-per-use capabilities. This means that resource use can be
measured and reported by the service provider, either on the virtual server instances running
through the cloud. You will receive a models pay depending on the manufacturing
company's actual consumption.

Q) Challenges and risk in cloud computing?


All has advantages and challenges. We saw many Cloud features and it’s time to identify the Cloud
computing challenges with tips and techniques you can identify all your own. Let's therefore start
to explore cloud computing risk and challenges. Nearly all companies are using cloud computing
because companies need to store the data. The companies generate and store a tremendous amount
of data. Thus, they face many security issues. Companies would include establishments to
streamline and optimize the process and to improve cloud computing management.

Cloud Computing: Unedited Version pg. 11


This is a list of all cloud computing threats and challenges:
11. Security & Privacy
12. Interoperability & Portability
13. Reliable and flexible
14. Cost
15. Downtime
16. Lack of resources
17. Dealing with Multi-Cloud Environments
18. Cloud Migration
19. Vendor Lock-In
20. Privacy and Legal issues

1. Security and Privacy of Cloud


The cloud data store must be secure and confidential. The clients are so dependent on the cloud
provider. In other words, the cloud provider must take security measures necessary to secure
customer data. Securities are also the customer's liability because they must have a good
password, don't share the password with others, and update our password on a regular basis. If
the data are outside of the firewall, certain problems may occur that the cloud provider can
eliminate. Hacking and malware are also one of the biggest problems because they can affect
many customers. Data loss can result; the encrypted file system and several other issues can be
disrupted.
2. Interoperability and Portability
Migration services into and out of the cloud shall be provided to the Customer. No bond period
should be allowed, as the customers can be hampered. The cloud will be capable of supplying
premises facilities. Remote access is one of the cloud obstacles, removing the ability for the
cloud provider to access the cloud from anywhere.
3. Reliable and Flexible
Reliability and flexibility are indeed a difficult task for cloud customers, which can eliminate
leakage of the data provided to the cloud and provide customer trustworthiness. To overcome
this challenge, third-party services should be monitored and the performance, robustness, and
dependence of companies supervised.
4. Cost
Cloud computing is affordable, but it can be sometimes expensive to change the cloud to
customer demand. In addition, it can hinder the small business by altering the cloud as demand
can sometimes cost more. Furthermore, it is sometimes costly to transfer data from the Cloud to
the premises.
5. Downtime
Downtime is the most popular cloud computing challenge as a platform free from downtime is
guaranteed by no cloud provider. Internet connection also plays an important role, as it can be a
problem if a company has a nontrustworthy internet connection, because it faces downtime.
6. Lack of resources
The cloud industry also faces a lack of resources and expertise, with many businesses hoping to
overcome it by hiring new, more experienced employees. These employees will not only help
solve the challenges of the business but will also train existing employees to benefit the
company. Currently, many IT employees work to enhance cloud computing skills and it is
difficult for the chief executive because the employees are little qualified. It claims that
employees with exposure of the latest innovations and associated technology would be more
important in businesses.

Cloud Computing: Unedited Version pg. 12


7. Dealing with Multi-Cloud Environments
Today not even a single cloud is operating with full businesses. According to the RightScale
report revelation, almost 84 percent of enterprises adopt a multi-cloud approach and 58 percent
have their hybrid cloud approaches mixed with the public and private clouds. In addition, five
different public and private clouds are used by organizations.

FIGURE 1. 8 RightScale 2019 report revelation


The teams of the IT infrastructure have more difficulty with a long-term prediction about the
future of cloud computing technology. Professionals have also suggested top strategies to
address this problem, such as rethinking processes, training personnel, tools, active vendor
relations management, and the studies.

8. Cloud Migration
While it is very simple to release a new app in the cloud, transferring an existing app to a cloud
computing environment is harder. 62% said their cloud migration projects are harder than they
expected, according to the report. In addition, 64% of migration projects took longer than
expected and 55% surpassed their budgets. In particular, organizations that migrate their
applications to the cloud reported migration downtime (37%), data before cutbacks
synchronization issues (40%), migration tooling problems that work well (40%), slow
migration of data (44%), security configuration issues (40%), and time-consuming
troubleshooting (47%). And to solve these problems, close to 42% of the IT experts said that
they wanted to see their budget increases and that around 45% of them wanted to work at an in-
house professional, 50% wanted to set the project longer, 56% wanted more pre-migration
tests.
9. Vendor lock-in
The problem with vendor lock-in cloud computing includes clients being reliant (i.e. locked in)
on the implementation of a single Cloud provider and not switching to another vendor without
any significant costs, regulatory restrictions or technological incompatibilities in the future. The
lock-up situation can be seen in apps for specific cloud platforms, such as Amazon EC2,
Microsoft Azure, that are not easily transferred to any other cloud platform and that users are
vulnerable to changes made by their providers to further confirm the lenses of a software
developer. In fact, the issue of lock-in arises when, for example, a company decide to modify
cloud providers (or perhaps integrate services from different providers), but cannot move
applications or data across different cloud services, as the semantics of cloud providers'
resources and services do not correspond. This heterogeneity of cloud semantics and APIs
creates technological incompatibility which in turn leads to challenge interoperability and
portability. This makes it very complicated and difficult to interoperate, cooperate, portability,
handle and maintain data and services. For these reasons, from the point of view of the
company it is important to maintain flexibility in changing providers according to business
needs or even to maintain in-house certain components which are less critical to safety due to
risks. The issue of supplier lock-in will prevent interoperability and portability between cloud
providers. It is the way for cloud providers and clients to become more competitive.
10. Privacy and Legal issues
Apparently, the main problem regarding cloud privacy/data security is 'data breach.'
Cloud Computing: Unedited Version pg. 13
Infringement of data can be generically defined as loss of electronically encrypted
personal information. An infringement of the information could lead to a multitude of
losses both for the provider and for the customer; identity theft, debit/credit card fraud
for the customer, loss of credibility, future prosecutions and so on. In the event of data
infringement, American law requires notification of data infringements by affected
persons. Nearly every State in the USA now needs to report data breaches to the affected
persons. Problems arise when data are subject to several jurisdictions, and the laws on
data privacy differ. For example, the Data Privacy Directive of the European Union
explicitly states that 'data can only leave the EU if it goes to a 'additional level of
security' country.' This rule, while simple to implement, limits movement of data and
thus decreases data capacity. The EU's regulations can be enforced.

Explain hardware virtualization


Hardware virtualization is the method used to create virtual versions of physical desktops and operating
systems. It uses a virtual machine manager (VMM) called a hypervisor to provide abstracted hardware to
multiple guest operating systems, which can then share the physical hardware resources more efficiently
Hardware virtualization, also known as platform virtualization, is a technology that enables the creation
and operation of virtual machines (VMs) on a physical computing system. It allows multiple operating
systems and applications to run simultaneously on a single hardware platform, as if they were running on
separate physical machines.
In hardware level virtualization, a software layer called a hypervisor, also known as a virtual machine
monitor (VMM), is installed on the host machine. The hypervisor acts as an intermediary between the
physical hardware and the virtual machines, managing the allocation of hardware resources such as CPU,
memory, storage, and network interfaces between those machines.
The hypervisor creates virtual instances of the underlying hardware, including virtual CPUs, memory
spaces, and disk storage, which are then assigned to each virtual machine. This enables each VM to operate
independently, with its isolated environment, as if running on its dedicated hardware. Solution Architect
courses will aid in fast-tracking your career with Cloud Computing certifications and acquiring essential
skills.
Isolation: Hardware-based virtualization provides strong isolation between virtual machines, which means
that any problems in one virtual machine will not affect other virtual machines running on the same
physical host.

Resource allocation: Hardware-based virtualization allows for flexible allocation of hardware resources
such as CPU, memory, and I/O bandwidth to virtual machines.

Snapshot and migration: Hardware-based virtualization allows for the creation of snapshots, which can be
used for backup and recovery purposes. It also allows for live migration of virtual machines between
physical hosts, which can be used for load balancing and other purposes.

Support for multiple operating systems: Hardware-based virtualization supports multiple operating
systems, which allows for the consolidation of workloads onto fewer physical machines, reducing
hardware and maintenance costs.

Compatibility: Hardware-based virtualization is compatible with most modern operating systems, making
it easy to integrate into existing IT infrastructure.
Advantages of hardware-based virtualization –
It reduces the maintenance overhead of paravirtualization as it reduces (ideally, eliminates) the
modification in the guest operating system. It is also significantly convenient to attain enhanced
performance. A practical benefit of hardware-based virtualization has been mentioned by VMware
engineers and Virtual Iron.

Disadvantages of hardware-based virtualization –


Hardware-based virtualization requires explicit support in the host CPU, which may not available on all
x86/x86_64 processors. A “pure” hardware-based virtualization approach, including the entire unmodified
guest operating system, involves many VM traps, and thus a rapid increase in CPU overhead occurs which
limits the scalability and efficiency of server consolidation. This performance hit can be mitigated by the
use of para-virtualized drivers; the combination has been called “hybrid virtualization”.

What are Different Types of Hardware Virtualization


Full Virtualization
With full virtualization, one of the different hardware virtualization types, VMs run their own operating
systems and applications, just as if they were on separate physical machines. This allows for great
flexibility and compatibility. You can have VMs running different operating systems, like Windows, Linux,
or even exotic ones, all coexisting peacefully on the same physical hardware.

Advantages
One of the key advantages is isolation. Each VM operates in its own virtual bubble, protected from the
chaos that might arise from other VMs sharing the same hardware.
Furthermore, full virtualization enables the migration of VMs between physical hosts. Imagine the ability
to move a running VM from one physical server to another, like a teleportation trick. This live migration
feature allows for workload balancing, hardware maintenance without downtime, and disaster recovery.
Full virtualization also plays a vital role in testing and development environments. It allows developers to
create different VMs for software testing, without the need for dedicated physical machines. This helps
them save a lot of money, time, and efforts in the long run.

Emulation Virtualization
Emulation virtualization, the next one in different types of hardware virtualization, relies on a clever
technique known as hardware emulation. Through hardware emulation, a virtual machine monitor, or
hypervisor, creates a simulated hardware environment within each virtual machine.
This simulated environment replicates the characteristics and behaviour of the desired hardware platform,
even if the underlying physical hardware is different. It's like putting on a digital costume that makes the
virtual machine look and feel like it's running on a specific type of hardware.

Advantages
But how does this aid in enabling hardware virtualization? Well, the main advantage of emulation
virtualization lies in its flexibility and compatibility. It enables virtual machines to run software that may be
tied to a specific hardware platform, without requiring the exact hardware to be present.
This flexibility is particularly useful in scenarios where legacy software or operating systems need to be
preserved or migrated to modern hardware. Emulation virtualization allows these legacy systems to
continue running on virtual machines, ensuring their longevity and compatibility with new hardware
architectures.
It is a powerful tool in the virtualization magician's arsenal, allowing us to transcend the limitations of
physical hardware and embrace a world of endless possibilities.

Q) Advantages of cloud computing in machine learning?

Para-Virtualization
Unlike other types of hardware virtualization, para-virtualization requires some special coordination
between the virtual machine and the hypervisor. The guest operating system running inside the virtual
machine undergoes slight modifications. These modifications introduce specialised API calls, allowing the
guest operating system to communicate directly with the hypervisor.

Advantages
This direct communication eliminates the need for certain resource-intensive tasks, such as hardware
emulation, which is required in full virtualization. By bypassing these tasks, para-virtualization can achieve
higher performance and efficiency compared to other virtualization techniques.
Para-virtualization shines in scenarios where performance is paramount. It's like having a race car driver
and a skilled navigator working together to achieve the fastest lap times. By leveraging the direct
communication between the guest operating system and the hypervisor, para-virtualization minimises the
overhead and latency associated with traditional virtualization approaches.
This performance boost is particularly beneficial for high-performance computing, real-time systems, and
I/O-intensive workloads. It's like having a turbocharger that boosts the virtual machine's performance,
enabling it to handle demanding tasks with efficiency and precision.

Advantages of Hardware Virtualization


Improved Resource Utilisation
With hardware virtualization, you can maximise the utilisation of physical resources such as CPU,
memory, and storage.
By running multiple virtual machines (VMs) on a single physical server, you can effectively make use of
the available resources.
Enhanced Scalability
Hardware virtualization enables you to easily scale your infrastructure to meet changing demands. Whether
you need to add more virtual machines or allocate additional resources to existing VMs, virtualization
allows for seamless scalability. It's like having the ability to expand your stage and accommodate more
performers as the audience grows.
Increased Flexibility and Agility
Virtualization offers flexibility by decoupling the software from the underlying hardware.
You can run different operating systems and applications on the same physical server, allowing for diverse
workloads and environments.
Cost Savings
One of the major benefits of hardware virtualization is significant cost savings. By consolidating multiple
physical servers into a virtualized environment, you reduce the need for additional hardware, power
consumption, and cooling costs. It enables optimising your expenses by sharing resources efficiently.
Improved Disaster Recovery and Business Continuity
Virtualization provides robust disaster recovery capabilities. With features like live migration and
snapshots, you can easily move virtual machines between physical hosts or create point-in-time backups. In
the event of hardware failure or a disaster, you can quickly restore operations, minimising downtime and
ensuring business continuity. It's like having an emergency plan that allows you to seamlessly switch
venues and continue with the work.
Simplified Testing and Development
Virtualization simplifies the process of testing and development. You can create isolated virtual
environments to test new software, configurations, or updates without impacting production systems. This
also can help you save a lot of time you’d have invested in gathering all the hardware for different
machines.
Enhanced Security
Hardware virtualization can improve security by isolating virtual machines from each other. Even if one
VM is compromised, the others remain unaffected.
Green IT and Environmental Benefits

Chapter2: Cloud Architecture and cloud service management.

2. Draw architecture of cloud computing

1. Frontend :
Frontend of the cloud architecture refers to the client side of cloud computing system. Means it contains all
the user interfaces and applications which are used by the client to access the cloud computing
services/resources. For example, use of a web browser to access the cloud platform.
Client Infrastructure – Client Infrastructure is a part of the frontend component. It contains the applications
and user interfaces which are required to access the cloud platform.
In other words, it provides a GUI( Graphical User Interface ) to interact with the cloud.
2. Backend :
Backend refers to the cloud itself which is used by the service provider. It contains the resources as well as
manages the resources and provides security mechanisms. Along with this, it includes huge storage, virtual
applications, virtual machines, traffic control mechanisms, deployment models, etc.

Application –
Application in backend refers to a software or platform to which client accesses. Means it provides the
service in backend as per the client requirement.
Service –
Service in backend refers to the major three types of cloud based services like SaaS, PaaS and IaaS. Also
manages which type of service the user accesses.
Runtime Cloud-
Runtime cloud in backend provides the execution and Runtime platform/environment to the Virtual
machine.
Storage –
Storage in backend provides flexible and scalable storage service and management of stored data.
Infrastructure –
Cloud Infrastructure in backend refers to the hardware and software components of cloud like it includes
servers, storage, network devices, virtualization software etc.
Management –
Management in backend refers to management of backend components like application, service, runtime
cloud, storage, infrastructure, and other security mechanisms etc.
Security –
Security in backend refers to implementation of different security mechanisms in the backend for secure
cloud resources, systems, files, and infrastructure to end-users.
Internet –
Internet connection acts as the medium or a bridge between frontend and backend and establishes the
interaction and communication between frontend and backend.
Database– Database in backend refers to provide database for storing structured data, such as SQL and
NOSQL databases. Example of Databases services include Amazon RDS, Microsoft Azure SQL database
and Google Cloud SQL.
Networking– Networking in backend services that provide networking infrastructure for application in the
cloud, such as load balancing, DNS and virtual private networks.
Analytics– Analytics in backend service that provides analytics capabilities for data in the cloud, such as
warehousing, business intelligence and machine learning.
CLOUD SERVICE MODELS:

A cloud conveyance model connotes an assigned, pre-bundled blend of IT assets available


by a cloud supplier. Three common cloud conveyance models turned out to be
comprehensively perceived and honorable:
• Infrastructure-as-a-Service (IaaS)
• Platform-as-a-Service (PaaS)
• Software-as-a-Service (SaaS)

2.1.4.1. Framework as-a-Service (IaaS)

The IaaS circulation model implies an independent IT climate contained of foundation driven
IT assets which will be recovered and accomplished by means of cloud administration based
interfaces and devices. This climate can incorporate equipment, organize, network, working
frameworks, and other "crude" IT assets. In distinction to conventional facilitating or
redistributing environmental factors, with IaaS, IT assets are normally virtualized and pressed
into wraps that compress in advance runtime climbing and customization of the framework.
the broadly useful of an IaaS domain is to flexibly cloud customers with an elevated level of
control and responsibility over its.

design and use. The IT assets gave by IaaS are by and large not pre-arranged, setting the
official obligation straightforwardly upon the cloud shopper. This model is consequently
utilized by cloud buyers that need a significant level of command over the cloud-based
condition they will make. Here and there cloud suppliers will contract IaaS contributions
from other cloud suppliers in order to scale their own cloud surroundings. the sorts and
makes of the IT assets gave by IaaS items offered by various cloud suppliers can change. IT
assets accessible through IaaS conditions are for the most part offered as newly instated
virtual occurrences. A focal and first IT asset inside a run of the mill IaaS condition is that
the virtual server. Virtual servers are rented by indicating server equipment necessities,
similar to processor limit, memory, and local space for putting away, as appeared in Figure

Cloud Computing: Unedited Version pg. 11


Figure 2.1.8. A cloud customer is using a virtual server within an IaaS atmosphere. Cloud
consumers are delivered with a range of contractual guarantees by the cloud provider, relating
to physiognomies such as capacity, performance, and availability.

(Reference :Cloud Computing(Concepts, Technology & Architecture) by Thomas


Erl,Zaigham Mahmood, and Ricardo Puttini)

2.1.4.2. Stage as-a-Service (PaaS)

The PaaS conveyance model says to a pre-categorized "prepared to-utilize" condition


ordinarily involved previously sent and arranged IT assets. In specific, PaaS be subject to on
the application of an prompt area that sets up a lot of pre-hustled stuffs and instruments used
to help the whole conveyance lifecycle of custom applications.

Even motives a cloud buyer would apply and place properties into a PaaS domain include:

• The cloud buyer needs to reach out on-premise conditions into the cloud for versatility
and financial purposes.

• The cloud customer utilizes the instant condition to totally substitute an on-
premise condition.

• The cloud consumer wants to go into a cloud dealer and takes its personal
cloud administrations to be complete available to additional outer cloud buyers.

By employed privileged an immediate phase, the cloud shopper is saved the authoritative
mass of location up and keeping up the exposed foundation IT assets gave by means of the
IaaS model. On the other hand, the cloud customer is conceded a lower level of authority
over the fundamental IT assets that host and arrangement the stage (Figure 4.12).

Figure 2.1.9. A cloud consumer is accessing a ready-made PaaS environment. The question
mark indicates that the cloud consumer is intentionally shielded from the implementation
details of the platform.

(Reference :Cloud Computing(Concepts, Technology & Architecture) by Thomas


Erl,Zaigham Mahmood, and Ricardo Puttini)

Cloud Computing: Unedited Version pg. 12


PaaS products are available with different development stacks. For example, Google
App Engine offers a Java and Python-based environment.

2.1.4.2.1. Programming as-a-Service (SaaS)

A product program situated as a typical cloud administration and made open as an


"item" or general worth connotes the standard profile of a SaaS offering. The SaaS
conveyance model is normally wont to make a refillable cloud administration
extensively available (frequently monetarily) to an assorted variety of cloud clients.
Complete profitable center occurs about SaaS matters which will be borrowed and
applied for different drives and by incomes of numerous terms

Figure 2.1.10. The cloud service customer is given access the cloud agreement, but to not
any fundamental IT resources or application details.

(Reference :Cloud Computing(Concepts, Technology & Architecture) by


Thomas Erl,Zaigham Mahmood, and Ricardo Puttini)

A cloud purchaser is normally allowed constrained authoritative command above a


SaaS practice. it's most normally provisioned by the cloud provider, yet it are
frequently formally claimed by whichever element expect the cloud administration
proprietor job. for example , an enterprise going about as a cloud buyer while utilizing
and managing a PaaS domain can assemble a cloud administration that it chooses to
convey in that equivalent condition as a SaaS offering. An identical association at that
point adequately accept the cloud supplier job in bright of the fact that the SaaS-based
cloud administration is framed accessible to different associations that go about as
cloud purchasers when utilizing that cloud administration.

Q) What are layers of cc?

1) Infrastructure as a Service (IaaS): IaaS is the basic layer of the cloud that comprises hardware
and network. That said, IaaS is different from a regular server as it comes with two key features
of cloud technology-virtualisation and scalability. IaaS service providers scale this layer in such a
manner that the additional cost of adding more storage or bandwidth is minimal. Owing to
virtualisation, these providers are able to use up to 90% of their computing resources in contrast
to traditional hosting services where servers may lay idle at times.
Provide infrastructure used by system administrators and network architecture

Provides underlying
os Security
Networking
Servers
Offers services where you can see IP address , stores data on virtual machine
Virtual local area network
Load balancer

Advantages of IaaS
The resources can be deployed by the provider to a customer’s environment at any given time.
Its ability to offer the users to scale the business based on their requirements.
The provider has various options when deploying resources including virtual machines, applications,
storage, and networks.
It has the potential to handle an immense number of users.
It is easy to expand and saves a lot of money. Companies can afford the huge costs associated with
the implementation of advanced technologies.
Cloud provides the architecture.
Enhanced scalability and quite flexible.
Dynamic workloads are supported.
Disadvantages of IaaS
Security issues are there.
Service and Network delays are quite a issue in IaaS.

2) Platform as a Service (PaaS): This layer of the cloud caters to the requirements of software
developers as it is the place where new applications are developed. You can use PaaS services to
build and test your applications on the cloud before deploying them. In fact, it is designed in such a
manner that it supports the entire lifecycle of an application, right from building, testing and
deployment to maintenance and updating. Like IaaS, PaaS includes infrastructure, but also includes
development tools, database management systems and much more.
Developers use it.
More control
Provides platform and environment
Allows developers to build applications and services over internet
Services are hosted in cloud and accessed by user via web browser
No control over infrastructure
We will interact with user interface provided by venue
Hosts hardware and software on its own infrastructure

Advantages of PaaS
Programmers need not worry about what specific database or language the application has been
programmed in.
It offers developers the to build applications without the overhead of the underlying operating system
or infrastructure.
Provides the freedom to developers to focus on the application’s design while the platform takes care
of the language and the database.
It is flexible and portable.
It is quite affordable.
It manages application development phases in the cloud very efficiently.
Disadvantages of PaaS
Data is not secure and is at big risk.
As data is stored both in local storage and cloud, there are high chances of data mismatch while
integrating the data.

3) Software as a Service (SaaS): The third and final layer of the cloud comes with a complete
software solution. Here, organizations rent the usage of a SaaS application, and users connect to it by
means of the internet. The application, therefore, needs to be web-based server so that it can be
accessed from anywhere. In this case, the service provider offers both the software and the hardware

Advantages of SaaS
It is a cloud computing service category providing a wide range of hosted capabilities and services.
These can be used to build and deploy web-based software applications.
It provides a lower cost of ownership than on-premises software. The reason is it does not require the
purchase or installation of hardware or licenses.
It can be easily accessed through a browser along a thin client.
No cost is required for initial setup.
Low maintenance costs.
Installation time is less, so time is managed properly.
Disadvantages of SaaS
Low performance.
It has limited customization options.
It has security and data concerns.

Layered Architecture of Cloud

Application Layer

1. The application layer, which is at the top of the stack, is where the actual cloud
apps are located. Cloud applications, as opposed to traditional applications, can
take advantage of the automatic-scaling functionality to gain greater
performance, availability, and lower operational costs.
2. This layer consists of different Cloud Services which are used by cloud users.
Users can access these applications according to their needs. Applications are
divided into Execution layers and Application layers.
3. In order for an application to transfer data, the application layer determines
whether communication partners are available. Whether enough cloud resources
are accessible for the required communication is decided at the application layer.
Applications must cooperate in order to communicate, and an application layer
is in charge of this.
4. The application layer, in particular, is responsible for processing IP traffic
handling protocols like Telnet and FTP. Other examples of application layer
systems include web browsers, SNMP protocols, HTTP protocols, or HTTPS,
which is HTTP’s successor protocol.
Platform Layer
1. The operating system and application software make up this layer.
2. Users should be able to rely on the platform to provide them with Scalability,
Dependability, and Security Protection which gives users a space to create
their apps, test operational processes, and keep track of execution outcomes and
performance. SaaS application implementation’s application layer foundation.
3. The objective of this layer is to deploy applications directly on virtual machines.
4. Operating systems and application frameworks make up the platform layer,
which is built on top of the infrastructure layer. The platform layer’s goal is to
lessen the difficulty of deploying programmers directly into VM containers.
5. By way of illustration, Google App Engine functions at the platform layer to
provide API support for implementing storage, databases, and business logic of
ordinary web apps.

Infrastructure Layer

1. It is a layer of virtualization where physical resources are divided into a


collection of virtual resources using virtualization technologies like Xen, KVM,
and VMware.
2. This layer serves as the Central Hub of the Cloud Environment, where
resources are constantly added utilizing a variety of virtualization techniques.
3. A base upon which to create the platform layer. constructed using the virtualized
network, storage, and computing resources. Give users the flexibility they want.
4. Automated resource provisioning is made possible by virtualization, which also
improves infrastructure management.
5. The infrastructure layer sometimes referred to as the virtualization layer,
partitions the physical resources using virtualization technologies like Xen,
KVM, Hyper-V, and VMware to create a pool of compute and storage
resources.
6. The infrastructure layer is crucial to cloud computing since virtualization
technologies are the only ones that can provide many vital capabilities, like
dynamic resource assignment.

Datacenter Layer

 In a cloud environment, this layer is responsible for Managing Physical


Resources such as servers, switches, routers, power supplies, and cooling
systems.
 Providing end users with services requires all resources to be available and
managed in data centers.
 Physical servers connect through high-speed devices such as routers and
switches to the data center.
 In software application designs, the division of business logic from the persistent
data it manipulates is well-established. This is due to the fact that the same data
cannot be incorporated into a single application because it can be used in
numerous ways to support numerous use cases. The requirement for this data to
become a service has arisen with the introduction of microservices.
 A single database used by many microservices creates a very close coupling. As
a result, it is hard to deploy new or emerging services separately if such services
need database modifications that may have an impact on other services. A data
layer containing many databases, each serving a single microservice or perhaps a
few closely related microservices, is needed to break complex service
interdependencies.

Q)What are types of cloud computing

Private Cloud: Here, computing resources are deployed for one particular organization. This method
is more used for intra-business interactions. Where the computing resources can be governed, owned
and operated by the same organization.
Community Cloud: Here, computing resources are provided for a community and organizations.
Public Cloud: This type of cloud is used usually for B2C (Business to Consumer) type interactions.
Here the computing resource is owned, governed and operated by government, an academic or
business organization.
Hybrid Cloud: This type of cloud can be used for both type of interactions – B2B (Business to
Business) or B2C ( Business to Consumer). This deployment method is called hybrid cloud as the
computing resources are bound together by different clouds.
Basis Of IAAS PAAS SAAS
Infrastructure as a
Stands for Platform as a service. Software as a service.
service.
IAAS is used by network
Uses PAAS is used by developers.
SAAS is used by the end user.
architects.
IAAS gives access to the
PAAS gives access to run time
resources like environment to
SAAS gives access to the end
Access virtual machines deployment and
user.
and virtual development tools for
storage. application.
It is a service model that
providesIt is a cloud computing model It that
is a service model in cloud
virtualized delivers tools that are used computing that hosts
Model
computing for the development of software to make it
resources over applications. available to clients.
the internet.
There is no requirement about
Technical It requires technicalSome knowledge is required for
technicalities company
understanding. knowledge. the basic setup.
handles everything.
It is popular among developers
It is popular among consumers
It is popular among
who focus on the and companies, such as
Popularity developers and
development of apps and file sharing, email, and
researchers.
scripts. networking.
It has about a 27 % rise in the
It has around a 12%
Percentage rise It has around 32% increment. cloud computing
increment.
model.
Used by the skilled
developer toUsed by mid-level developers toUsed among the users of
Usage
develop unique build applications. entertainment.
applications.
Amazon Web Services,
Facebook, and Google search
MS Office web, Facebook and
Cloud services. sun, vCloud
engine. Google Apps.
Express.
AWS virtual private
Enterprise services. Microsoft Azure. IBM cloud analysis.
cloud.
Outsourced cloud
Salesforce Force.com, Gigaspaces. AWS, Terremark
services.
Operating System,
Runtime,
User Controls Data of the application Nothing
Middleware, and
Application data
It is highly scalable to suit the
It is highly scalable to suit the
It is highly scalable and small, mid and
Others different businesses
flexible. enterprise level
according to resources.
business
Q) Cloud service management policies and meachanisms?
Cloud computing is an internet based computing, providing the on demand self services over the
internet for the use of servers, storage space or disk, different platforms and applications to any cloud
user. The cloud computing services are `Pay as per your usage' based on the agreement between the
Cloud Service Provider and Cloud customer. Service Level Agreement (SLA) is a contract between
service provider and the third party such as Cloud user or Broker(agent), where service conditions
are formally or legally defined. Oftenly Service Level Agreement(SLA) is used to define the terms
of the delivery period of the service, and the different performance parameters of service to be
provided by the provider.
Importance of SLA:
● The consumer can get the information about the service providers.
● SLA describes the complete information about the service and the type of services
(SaaS, PaaS, IaaS) that are provided to a particular consumer.
● SLA describes the purpose and objectives based on business level policies, which
includes the part of the service provider and the customer.
● The consumers will be able to identify the key security and management strategies of
agreement.
● SLA is used to monitor the quality of service, performance, response time from the
service point of view.
● The consumer can get the idea about the requirements for the management of the
service in case of poor performance.

A Service Level Agreement (SLA) is the bond for performance negotiated between the cloud
services provider and the client. Earlier, in cloud computing all Service Level Agreements were
negotiated between a client and the service consumer. Nowadays, with the initiation of large utility-
like cloud computing providers, most Service Level Agreements are standardized until a client
becomes a large consumer of cloud services. Service level agreements are also defined at different
levels which are mentioned below:
● Customer-based SLA
● Service-based SLA
Multi level SLA

TYPES OF SLA:
Service-level agreement provides a framework within which both seller and buyer of a service can
pursue a profitable service business relationship. It outlines the broad understanding between the
service provider and the serviceconsumer for conducting business and forms the basis for
maintaining a mutually beneficial relationship. From a legal perspective, the necessary terms and
conditions that bind the service provider to provide services continually to the service consumer are
formally defined in SLA.
There are two types of SLAs from the perspective of application hosting.
1. Infrastructure SLA
2.Application SLA

Types of SLA:

There are two types of SLAs from the perspective of application hosting.
1. Infrastructure SLA
2. Application SLA
Infrastructure SLA.
The infrastructure provider manages and offers guarantees on availability of the infrastructure,
namely, server machine, power,network connectivity, and so on. Enterprises manage themselves,
their applications that are deployed on these server machines. The machines are leased to the
customers and are isolated from machines of other customers.

Application SLA.
In the application co-location hosting model, the server capacity is available to the applications
based saolely on their resource demands.Hence, the service providers are flexible in allocating and
de-allocating computing resources among the co-located applications. Therefore, the service
providers are also responsible for ensuring to meet their customer’s application SLOs. For example,
an enterprise can have the following application SLA with a service provider for one of its
application, as shown in Table
Each SLA goes through a sequence of steps starting from identification of terms and conditions,
activation and monitoring of the stated terms and conditions, and eventual termination of contract
once the hosting relationship ceases to exist. Such a sequence of steps is called SLA life cycle and
consists of the following five phases:
1. Contract definition
2. Publishing and discovery
3. Negotiation
4. Operationalization
5.De-commissioning

SLA is a legal, formal and negotiated document that defines the service in terms of quantitative and
qualitative metrics. The metrics which is involved in SLA should be proficient of being measured on
a consistent basis, and the SLA should be evaluated by that metrics. SLA plays a role in the life cycle
of the service. SLA cannot guarantee that the consumer can access the service as described in the
SLA document. In future our work is focused on developing a approach to certify that the service is
provided according the specified level of quality which is mentioned in the SLA.

Chapter 3: Cloud Data Storage:


What is Cloud Storage?
Cloud Storage is technology that allows us to save files in storage, and then access those files via the
Cloud. Let's break down this definition. First, storage is the computer's ability to save files and other
resources for later use. When you restart a computer, the files that are still available after the
computer turns back on are saved and read from storage. Such storage commonly consists of a hard
drive, a USB Flash drive, or another type of drive. How Cloud Storage Works

How Cloud Storage Works? Cloud storage is saving data to an off-site storage system maintained by
a third party. Rather than storing information to your computer’s hard drive or other local storage
device, you save it to a remote database. The Internet provides the connection between your
computer and the database. Cloud Storage Architecture Cloud Computing Reference Architecture

1.Cloud Consumer The cloud consumer is the principal stakeholder for the cloud computing service.
A cloud consumer represents a person or organization that maintains a business relationship with,
and uses the service from a cloud provider. A cloud consumer browses the service catalog from a
cloud provider, requests the appropriate service, sets up service contracts with the cloud provider,
and uses the service. The cloud consumer may be billed for the service provisioned, and needs to
arrange payments accordingly. A cloud provider may also list in the SLAs a set of promises explicitly
not made to consumers, i.e. limitations, and obligations that cloud consumers must accept. A cloud
consumer can freely choose a cloud provider with better pricing and more favorable terms.
2. Cloud Provider A cloud provider is a person, an organization; it is the entity responsible for
making a service available to interested parties. A Cloud Provider acquires and manages the
computing infrastructure required for providing the services, runs the cloud software that provides
the services, and makes arrangement to deliver the cloud services to the Cloud Consumers through
network access. For Software as a Service, the cloud provider deploys, configures, maintains and
updates the operation of the software applications on a cloud infrastructure so that the services are
provisioned at the expected service levels to cloud consumers.
3. Cloud Auditor A cloud auditor is a party that can perform an independent examination of cloud
service controls with the intent to express an opinion thereon. Audits are performed to verify
conformance to standards through review of objective evidence. A cloud auditor can evaluate the
services provided by a cloud provider in terms of security controls, privacy impact, performance, etc.
For security auditing, a cloud auditor can make an assessment of the security controls in the
information system to determine the extent to which the controls are implemented correctly,
operating as intended, and producing the desired outcome with respect to the security requirements
for the system.
4. Cloud Broker As cloud computing evolves, the integration of cloud services can be too complex
for cloud consumers to manage. A cloud consumer may request cloud services from a cloud broker,
instead of contacting a cloud provider directly. A cloud broker is an entity that manages the use,
performance and delivery of cloud services and negotiates relationships between cloud providers and
cloud consumers. In general, a cloud broker can provide services in three categories [9]: Service
Intermediation: A cloud broker enhances a given service by improving some specific capability and
providing value-added services to cloud consumers.
5. Cloud Carrier A cloud carrier acts as an intermediary that provides connectivity and transport of
cloud services between cloud consumers and cloud providers. Cloud carriers provide access to
consumers through network, telecommunication and other access devices. Types of Cloud Storage
There are four main types of cloud storage —
1. Public cloud
2. Private cloud
3. Community cloud
4. Hybrid cloud Public Cloud Public cloud storage is where the enterprise and storage service
provider are separate and there aren't any cloud resources stored in the enterprise's data center.
The cloud storage provider fully manages the enterprise's public cloud storage. Private Cloud
Companies that look for cost efficiency and greater control over data & resources will find the
private cloud a more suitable choice. The private cloud offers bigger opportunities that help meet
specific organizations' requirements when it comes to customization. Community Cloud The
community cloud operates in a way that is similar to the public cloud. There's just one difference - it
allows access to only a specific set of users who share common objectives and use cases.
This type of deployment model of cloud computing is managed and hosted internally or by a third-
party vendor. Hybrid Cloud A hybrid cloud or cloud bursting is a combination of two or more cloud
architectures. While each model in the hybrid cloud functions differently, it is all part of the same
architecture. Further, as part of this deployment of the cloud computing model, the internal or
external providers can offer resources.A company with critical data will prefer storing on a private
cloud, while less sensitive data can be stored on a public cloud. Risks of Cloud Storage
1. Requires high speed internet connection most of the time.
2. Data is stored on third party servers.
3. When a provider closes its service for maintenance,
4. You may find it troublesome to access your data.If your provider closes its service permanently,
you may lose you valuable data.
5. Premium services cost you a considerable amount for the storage volume Advantages of Cloud
Storage
1. Usability: All cloud storage services reviewed in this topic have desktop folders for Mac’s and
PC’s. This allows users to drag and drop files between the cloud storage and their local storage.
2. Bandwidth: You can avoid emailing files to individuals and instead send a web link to recipients
through your email.
3. Accessibility: Stored files can be accessed from anywhere via Internet connection.
4. Disaster Recovery: It is highly recommended that businesses have an emergency backup plan
ready in the case of an emergency. Cloud storage can be used as a back‐up plan by businesses by
providing a second copy of important files.
5. Cost Savings: Businesses and organizations can often reduce annual operating costs by using
cloud storage; cloud storage costs about 3 cents per gigabyte to store data internally.
Disadvantages of Cloud Storage
1. Usability: Be careful when using drag/drop to move a document into the cloud storage folder.
This will permanently move your document from its original folder to the cloud storage location.
2. Bandwidth: Several cloud storage services have a specific bandwidth allowance.
3. Accessibility: If you have no internet connection, you have no access to your data.
4. Data Security: There are concerns with the safety and privacy of important data stored remotely.
The possibility of private data commingling with other organizations makes some businesses uneasy.
5. Software: If you want to be able to manipulate your files locally through multiple devices, you’ll
need to download the service on all devices.

Chapter 4: Data management using Cloud Computing:

What is data pipelining ?


A data pipeline is a process that moves data from one system or format to
another. The data pipeline typically includes a series of steps. This is for
extracting data from a source, transforming and cleaning it, and loading it
into a destination system, such as a database or a data warehouse. Data
pipelines can be used for a variety of purposes, including data integration,
data warehousing, automating data migration, and analytics.

What is purpose of data pipelining


The data pipeline is a key element in the overall data management process. Its purpose is to automate
and scale repetitive data
flows and associated data collection, transformation and integration tasks. A properly constructed
data pipeline can accelerate the processing that;s required as data is gathered, cleansed, filtered,
enriched and moved to
downstream systems and applications. Well-designed pipelines also enable organizations to take
advantage of big
data assets that often include large amounts of structured, unstructured and semistructured data. In
many cases, some of that is real-time data generated and updated on an ongoing basis. As
the volume, variety and
velocity of data continue to grow in big data systems, the need for data pipelines that can linearly
scale -- whether in on-premises, cloud or hybrid cloud environments -- is becoming increasingly
critical to analytics initiatives and business operations.

Who needs data pipelinig


A data pipeline is needed for any analytics application or business process
that requires regular aggregation, cleansing, transformation and distribution
of data to downstream data consumers. Typical data pipeline users include
the following:
1) Data scientists and other members of data science teams.
2) Business intelligence (BI) analysts and developers.
3) Business analysts.
Senior management and other business executives.
4) Marketing and sales teams.
5) Operational workers.
To make it easier for business users to access relevant data, pipelines can
also be used to feed it into BI dashboards and reports, as well as
operational monitoring and alerting systems.

How data pipeline works


The data pipeline development process starts by defining what, where and
how data is generated or collected. That includes capturing source system
characteristics, such as data formats, data structures, data schemas and
data definitions -- information thats needed to plan and build a pipeline.
Once its in place, the data pipeline typically involves the following steps:
Many data pipelines are built by data engineers or big data engineers. To create
effective pipelines, its critical that they develop their soft skills -- meaning their
interpersonal and communication skills. This will help them collaborate with data
scientists, other analysts and business stakeholders to identify user requirements
and the data that needed to meet them before launching a data pipeline
development project. Such skills are also necessary for ongoing conversations to
prioritize new development plans and manage existing data pipelines.

Other best practices on data pipelines include the following:


Manage the development of a data pipeline as a project, with defined goals and delivery dates.
Document data lineage information so the history, technical attributes
and business meaning of data can be understood.
Ensure that the proper context of data is maintained as its transformed in
a pipeline. Create reusable processes or templates for data pipeline steps to
streamline development.
Avoid scope creep that can complicate pipeline projects and create
unrealistic expectations among users.

1. Data ingestion. Raw data from one or more source systems is


ingested into the data pipeline. Depending on the data set, data
ingestion can be done in batch or real-time mode.
Data integration. If multiple data sets are being pulled into the
pipeline for use in analytics or operational applications, they need
to be combined through data integration processes.
3. Data cleansing. For most applications, data quality management
measures are applied to the raw data in the pipeline to ensure that
its clean, accurate and consistent.
4. Data filtering. Data sets are commonly filtered to remove data
that isnt needed for the particular applications the pipeline was
built to support.
5. Data transformation. The data is modified as needed for the
planned applications. Examples of data transformation methods
include aggregation, generalization, reduction and smoothing.
6. Data enrichment. In some cases, data sets are augmented and
enriched as part of the pipeline through the addition of more data
elements required for applications.
7. Data validation. The finalized data is checked to confirm that it is
valid and fully meets the application requirements.
8. Data loading. For BI and analytics applications, the data is
loaded into a data store so it can be accessed by users. Typically,
thats a data warehouse, a data lake or a data lakehouse, which
combines elements of the other two platforms.

Many data pipelines also apply machine learning and neural network algorithms to create more
advanced data transformations and
enrichments. This includes segmentation, regression analysis, clustering and the creation of
advanced indices and propensity scores.
In addition, logic and algorithms can be built into a data pipeline to add intelligence.
As machine learning -- and, especially, automated machine learning
( AutoML ) -- processes become more prevalent, data pipelines likely will
become increasingly intelligent. With these processes, intelligent data
pipelines could continuously learn and adapt based on the characteristics
of source systems, required data transformations and enrichments, and
evolving business and application requirements.

There are several types of data pipeline architecture, each with its own set
of characteristics and use cases. Some of the most common types include:
1) Batch Processing: Data is processed in batches at set intervals,
such as daily or weekly.
2)Real-Time Streaming: Data is processed as soon as it is generated,
with minimal delay.
3) Lambda Architecture: A combination of batch and real-time
processing, where data is first processed in batch and then updated
in real-time.
4) Kappa Architecture: Similar to Lambda architecture, data is only
processed once, and all data is ingested in real time.
5) Microservices Architecture: Data is processed using loosely
coupled, independently deployable services.
5)ETL (Extract, Transform, Load) Architecture: Data is extracted
from various sources, transformed to fit the target system, and
loaded into the target system.
A data pipeline architecture is essential for several reasons:
Scalability: Data pipeline architecture should allow for the efficient
processing of large amounts of data, enabling organizations to scale
their data processing capabilities as their data volume increases.
a) Reliability: A well-designed data pipeline architecture ensures that
data is processed accurately and reliably. This reduces the risk of
errors and inaccuracies in the data.
b) Efficiency: Data pipeline architecture streamlines the data
processing workflow, making it more efficient and reducing the time
and resources required to process data.
c) Flexibility: It allows for the integration of different data sources and
the ability to adapt to changing business requirements.
5) Security: Data pipeline architecture enables organizations to
implement security measures, such as encryption and access
controls, to protect sensitive data.
6) Data Governance: Data pipeline architecture allows organizations to
implement data governance practices such as data lineage, data
quality, and data cataloging that help maintain data accuracy,
completeness, and reliability.

Data pipelines can be compared to the plumbing system in the real world.
Both are crucial channels that meet basic needs, whether it’s moving data
or water. Both systems can malfunction and require maintenance. In many
companies, a team of data engineers will design and maintain data
pipelines.
Data pipelines should be automated as much as possible to
reduce the need for manual supervision. However, even with data
automation, businesses may still face challenges with their data pipelines:
1) Complexity: In large companies, there could be a large number of
data pipelines in operation. Managing and understanding all these
pipelines at scale can be difficult, such as identifying which pipelines
are currently in use, how current they are, and what dashboards or
reports rely on them. In an environment with multiple data pipelines,
tasks such as complying with regulations and migrating to the cloud
can become more complicated.
2) Cost: Building data pipelines at a large scale can be costly.
Advancements in technology, migration to the cloud, and demands
for more data analysis may all require data engineers and developers
to create new pipelines. Managing multiple data pipelines may lead
to increased operational expenses as time goes by.
a) Efficiency: Data pipelines may lead to slow query performance
depending on how data is replicated and transferred within an
organization. When there are many simultaneous requests or large
amounts of data, pipelines can become slow, particularly in situations that involve multiple data
replicas or use data
virtualization techniques.
What are data pipeline design patterns
Data pipeline design patterns are templates used as a foundation for creating data pipelines. The
choice of design pattern depends on various
factors, such as how data is received, the business use cases, and the data volume. Some common
design patterns include:
2) Raw Data Load: This pattern involves moving and loading raw data from one location to another,
such as between databases or from an
on-premise data center to the cloud. However, this pattern only focuses on the extraction and loading
process and can be slow and
time-consuming with large data volumes. It works well for one-time operations but is not suitable for
recurring situations.
#) Extract, Transform, Load (ETL): This is a widely used pattern for loading data into data
warehouses, lakes, and operational data
stores. It involves the extraction, transformation, and loading of data from one location to another.
However, most ETL processes use
batch processing which can introduce latency to operations.
Streaming ETL: Similar to the standard ETL pattern but with data streams as the origin, this pattern
uses tools like Apache Kafka or
StreamSets Data Collector Engine for the complex ETL processes. Extract, Load, Transform
(ELT): This pattern is similar to ETL, but
the transformation process happens after the data is loaded into the target destination, which can
reduce latency. However, this design
can affect data quality and violate data privacy rules.
Change, Data, Capture (CDC): This pattern introduces freshness to data processed using the ETL
batch processing pattern by detecting
changes that occur during the ETL process and sending them to message queues for downstream
processing.
#) Data Stream Processing: This pattern is suitable for feeding real- time data to high-performance
applications such as IoT and financial
applications. Data is continuously received from devices, parsed and filtered, processed, and sent to
various destinations like dashboards
for real-time applications.

Difference between elt and data pipeline


Both data pipelines and ETL are responsible for transferring data between sources and storage
solutions, but they do so in different ways.
Data pipelines work with ongoing data streams in real time
An ETL pipeline refers to a set of integration-related batch processes that run on a scheduled basis.
ETL jobs extract data from one or more systems, do basic data transformations and load the data into
a repository for analytics or operational uses.
A data pipeline, on the other hand, involves a more advanced set of data processing
activities for filtering, transforming and enriching data to meet user needs.
As mentioned above, a data pipeline can handle batch processing but also run in real- time mode,
either with streaming data or triggered by a predetermined rule or set of conditions. As a result, an
ETL pipeline can be seen as one form of a data pipeline.

Difffernce between etl and elt


ETL focuses more on individual “batches” of data for more specific purposes.
Transforms data before loading into data warehouse
Best for predefined applications with known transformation requirements

ELT transforms data after loading into data warehouse


Best for supporting wide range of applications with different transformation requirements

Q)What is ETL pipeline?


q) Can you describe the components of a typical data pipeline?

1. Storage
One of the first components of a data pipeline is storage.
Storage provides the foundation for all other components, as it sets up the pipeline for success. It
simply acts as a place to hold big data until the necessary tools are available to perform more in-
depth tasks. The main function of storage is to provide cost-effective large-scale storage that scales
as the organization’s data grows.

2. Preprocessing
The next component of a data pipeline is preprocessing.
This part of the process prepares big data for analysis and creates a controlled environment for
downstream processes.
The goal of preprocessing is to “clean up” data, which means correcting dirty inputs, unraveling
messy data structures, and transforming unstructured information into a structured format (like
putting all customer names in the same field rather than keeping them in separate fields).It also
includes identifying and tagging relevant subsets of the data for different types of analysis.data
pipelines

3. Analysis
The third component of a data pipeline is analysis, which provides useful insights into the collected
information and makes it possible to compare new data with existing big data sets. It also helps
organizations identify relationships between variables in large datasets to eventually create models
that represent real-world processes.
4. Applications
The fourth component of a data pipeline is applications, which are specialized tools that provide the
necessary functions to transform processed data into valuable information. Software such as business
intelligence (BI) can help customers quickly make applications out of their data.For example, an
organization may use statistical software to analyze big data and generate reports for business
intelligence purposes.

5. Delivery
The final component of a data pipeline is delivery, which is the final presentation piece used to
deliver valuable information to those who need it. For example, a company may use web-
based reporting tools, SaaS applications or a BI solution to deliver the content to

CHAPTER5: VIRTUALIZATION & CONTAINERIZATION ON &ELASTICITY IN CLOUD


COMPUTING:

What is Docker, and why is it used for containerization?


1) Docker is the containerization platform that is used to package your application and all its
dependencies together in the form of containers to make sure that your application works
seamlessly in any environment which can be
developed or tested or in production.
2) Docker is a tool designed to make it easier to
create, deploy, and run applications by using
containers.
3) Docker is the world’s leading software
container platform. It was launched in 2013 by a
company called Dotcloud, Inc which was later
renamed Docker, Inc. It is written in the Go
language.
Docker architecture consists of Docker client, Docker Daemon running on Docker Host, and Docker
Hub repository. Docker has client
Server architecture in which the client communicates with the Docker Daemon running on the
Docker Host using a combination of APIs, Socket IO, and TCP.

What are components of Docker


1.Docker Clients and Servers– Docker has a client-server architecture. The Docker Daemon/Server
consists of all containers.
The Docker Daemon/Server receives the request from the Docker client through CLI or REST APIs
and thus processes the
request accordingly. Docker client and Daemon can be present on the same host or different host.
1) Docker Images– Docker images are used to build docker containers by using
a read-only template.
2) Docker File– Docker file is a text file that contains a series of instructions on how to build your
Docker image.
3) Docker Registries– Docker Registry is a storage component for Docker images. We can store the
images in either public/private repositories so that multiple users C
can collaborate in building the application.
4) Docker Containers– Docker Containers are runtime instances of Docker images. Containers
contain the whole kit required for an application, so the application can be run in an isolated way.

Advantages of Docker
1.Speed – The speed of Docker containers compared to a virtual machine is very fast.
The time required to build a container is very fast because they are tiny and lightweight.
2.Portability – The applications that are built inside docker containers are extremely portable. These
portable applications can easily be moved anywhere as a single element and their performance also
remains the same.

3.Scalability – Docker has the ability that it can be deployed on several physical servers, data
servers, and cloud platforms. It can also be run on every Linux machine. Containers can easily be
moved from a cloud environment to a local host and from there back to the cloud again at a fast
pace.

4 Density – Docker uses the resources that are available more efficiently because it does not use a
hypervisor. This is the reason that more containers can be run on a single host as compared to virtual
machines. Docker Containers have higher performance because of their high density and no
overhead wastage of resources.
4. Explain the difference between a Docker container and a virtual machine.
A VM lets you run a virtual machine on any hardware. Docker lets you run an application on any
operating system. It uses isolated user
space instances known as containers.
Docker containers have their own file system, dependency structure, processes, and network
capabilities. The application has
it requires inside the container and can run anywhere. Docker container technology uses the
underlying host operating system
resources directly.

what is containerization
Containerization is a method of virtualizing an operating system so that multiple isolated
applications can run on a single host operating system.
Containerization is a method of virtualization that packages applications and their dependencies into
isolated, self-contained containers, allowing them to run securely and independently from each other
on the same host. It provides an efficient way to
deploy, manage, and scale applications across different platforms.
For example, Docker is a popular form of containerization that allows software
developers to package their applications into standardized isolated containers. Docker makes it easier
for applications to run on any system, regardless of its underlying infrastructure. The most
significant benefit of containerization is increased efficiency. Containers allow applications to run in
isolated, secure environments, improving resource utilization and allowing for more flexibility in
deployment. Additionally, containers make deploying, scaling, and managing applications easier,
resulting in improved
operational agility. Steps are:
Package the application and dependencies into a standard file format, such as a Docker image.
Deploy the packaged application and its dependencies into a container.
Execute the containerized application in the container runtime environment.

Differ between repository vs registry


The terms repository and registry may be easily confused when talking
about containers. A container repository is used to store related images for
setup and deployment. Container repositories can be used to manage, pull
or push images.
Container registries store multiple repositories of container images, as well
as storing API paths and access control rules. Container registries also
have to option of being hosted publicly or privately.

What is container repository


A container registry is a collection of repositories made to store container
images. A container image is a file comprised of multiple layers which can
execute applications in a single instance.
Types of registries
A container registry is a type of tool that can host and distribute container images.
A container image is a binary file that serves as the blueprint for executing applications as
containers. Container images aren’t containers themselves; to create a container, you have to run a
container based on a container image. But container images tell your container runtime which
processes to execute when it starts a container.

Thus, the role of a container registry is not to run containers, but rather to provide an efficient,
centralized solution for storing the data that is necessary for running containers. By allowing teams
to host a virtually unlimited number of container images in a single place, container registries make
it easy for developers to publish their applications as container images, and for users to access those
images.
Public container registries are generally the faster and easier
route when initiating a container registry.
2)Public registries are also seen to be easier to use.
3)they may also be less secure than private registries.
4) They are for smaller teams and wroks for standard and open
sourced images from public registries.

A private container registry is set up by the organization using it.


2) Private registries are either hosted or on premises and popular with
larger organization or enterprises that are more set on using a
container registry.
3) Having complete control over the registry in development allows an
organization more freedom in how they choose to manage it.
4) private registries are seen to be the more secure .

What is container security


Public containers are seen as less secure because individual
container images may contain malicious or outdated code which, if
goes unpatched, could lead to a data breach. It may also be unknown
who has read or write access to an image.
If an organizations priority is security when it comes to container registries,
then they should implement a private registry. Other security approaches to
container registries include:
Assigning role-based access control (RBAC).
Scanning for vulnerabilities in images.
Digitally signing images to ensure each image is trusted.
Using authentication methods such as access tokens or JSON
key files, similar to how Google's container registry works.
Using Identity and Access Manager (IAM) settings, like how IBM's
Cloud Container Registry does.
Explain elastic resources
1) Elastic resources are applications and infrastructure that can be summoned on
demand when traffic or workloads get high.
2)Cloud computing businesses such as AWS and Google Cloud rely on elastic
resources as a business model to bill customers on-demand like a utility bill.
3) This supply side and demand side economic cycle are the underpinnings of the cloud
ecosystem.
A good example of an elastic resource is an EC2 server. If a business only requires 2 servers
to run their website, but see a holiday traffic spike, they can simply allocate additional elastic
resources by increasing the EC2 servers from 2 to 4 to handle the holiday traffic load. Once
that traffic dies down, they can deprovision the servers back to 2. This is an elastic
resource.”
. Explain the difference between a Docker container and a virtual machine.
A VM lets you run a virtual machine on any hardware. Docker lets you run an application on any
operating system. It uses isolated user
space instances known as containers.
Docker containers have their own file system, dependency structure, processes, and network
capabilities. The application has
it requires inside the container and can run anywhere. Docker container technology uses the
underlying host operating system
resources directly.
Chapter 6:Managed Machine learning systems:
Q)compare commercial and open source ML systems:

You might also like