0% found this document useful (0 votes)
10 views

Cloud Computing For Data Science

unit 1 for AML diploma

Uploaded by

dhanashree
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

Cloud Computing For Data Science

unit 1 for AML diploma

Uploaded by

dhanashree
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

Cloud Computing for Data Science

Unit 1 :- Cloud Computing Fundamental (12 Marks)


1.1 Introduction to cloud computing :-
 Cloud computing refers to the “servers” that are accessed over the
internet. (Present at remote location).
Definition :-
 In simple terms, It means storing, managing and accessing the data and
programs on the remote servers that are hosted on internet instead of
computer’s hard drive.

OR

Cloud computing is the on-demand availability of computer resources


(especially data storage / cloud storage & computing power) without
direct active management by the user.

In short,
We store, manage & process data on remote servers.

1. Service Providers :-
1. Google Cloud
2. AWS(Amazon Web Service)
3. Microsoft Azure
4. IBM Cloud
5. Alibaba Cloud etc.,
2. Types of Cloud :-
1. Public :-Accessible to all.
2. Private :- Servers accessible within an organization.
3. Hybrid :-Public + Private cloud computing features.
4. Community :-Services accessible by a group of organization.

Evolution of cloud computing :-

1. Distributed System :-
1. It is composition of multiple independent system but all of them
are depicted as a single entity to the users.
2. The purpose of distributed system is to share resources and also use
them effectively and efficiently.
3. Distributed system posses characteristics such as scalability,
concurrency, continuous availability, heterogenetic and
independence in failure.
4. But the main problem with this system was that all systems were
required to be present in same geographical location.
5. Thus to solve this problem, distributed computing led to there more
types of computing and they are :
 Mainframe computing :-
 It came into existence in 1951, It is highly powerful
and reliable computer machine.
 It is responsible to handle large data such as massive
input output operation.
 These systems have almost no downtime with high
fault tolerance.
 It is very expensive.
 The reduce the cost cluster computing and grid
computing came as alternative to mainframe
technology.

 Cluster Computing :-
 Nodes must be homogeneous they should have same
type of hardware and operating systems.
 Computer are located close to each other.
 Computers are connected by high speed local area
network bus.
 Computers are connected in centralized network
topology.
 Scheduling is controlled by central server.
 Whole system has centralized resource manager.
 Whole systems functions as a single system.
 It’s used in web logic application servers, database
etc.,
 Grid Computing :-
 Nodes may have different operating system and
hardware machine can be homogenous an
heterogeneous.
 Computers may be located at a huge distance from
one another.
 Computers are connected using low speed bus or
the internet.
 Computer are connected in distributed or
decentralized network topology.
 It has many servers, but mostly each nodes
behaviours independently.
 Every node is autonomous an anyone can opt out
anytime.
 It’s used in predictive modelling, automation,
simultaneous etc.,

2.Virtualization :-

 It’s came into existence 40 years back and it is becoming the current
technique used in IT firms.
 It employs a software layer over the hardware and using this it provides
the customer with cloud based services.

3. WEB 2.0 :-
 The computing lets the users generate their content and collaborate
with other peoples or share the information using social media.
 Web 2.0 is a combination of the second generation technology
“World Wide Web(WWW)” along with web services and it is the
computing type that is used today.

4. Service Orientation :-

 It acts as a reference model for cloud computing it supports low


cost, flexible an evolvable application.
 Two important concepts were introduced in this computing
model.
There were Quality of Service(QoS), which also include Service
Level Agreement(SLA) and Software as a Service(SaaS).
5. Utility Orientation :-

 It is a computing model that defines service provisioning


technique for services such as compute services along with
other major services such as storage, infrastructure etc.,
Which are provisional on pay-per-use basis.
Service Oriented Architecture :-
1. Service-oriented Architecture(SOA) is an architectural approach
in which application is develop using the services that are
available on the internet.
2. Services are provided to form application through internet.

SOA VS Web Services

SOA is an architectural concept which focuses on having different services


communicating with each other to carry out a bigger job. Thus, a web service is
a basic building block in a SOA. When multiple services are combined, we have
an application that falls under SOA.

Example :-

 Google Docs

There are two major roles within Service-oriented


Architecture :

 Service Provider
 Service Consumer
Web Services :-

1. In the context of cloud computing, web services refer to software


components or applications that provide interoperable communication
over the internet.
2. These services enable different systems and applications to communicate,
interact, and exchange data seamlessly, regardless of the underlying
platforms, technologies or programming languages.

Grid Computing :-
1. Grid Computing is a collection of computer resources from
multiple locations to reach a common goal.
2. The grid can be thought as a distributed system with non
interactive workload that involve a large number of files.
3. A computational grid is a software and hardware infrastructure that
provides dependable, consistent and inexpensive access to high end
computational capabilities.
4. The links together computing resources(PC’s , workstation,
servers, storage elements) and provides a mechanism to access
them
 How Grid Computing works :-
1. In general grid computing system requires, at least one
computer, usually a server, which handles all the
administrative duties of the system.

2. A network of computers running special grid computing


network softwares.
3. A collection of computer software middleware

Utility Computing :-
1. Utility computing is a service provisioning model that offers
computing resources such as hardware, software, and network
bandwidth to client as and when they require them on an on-
demand basis. The service provider charges only as per the
consumption of the services, rather than a fixed charge or a flat
rates.
2. Utility computing is a subset of cloud computing.
3. Utility computing is a model where computing resources, such as
processing power, storage and applications are provided to users on
demand, much like a traditional utility such as electricity or water.
Users typically pay for these resources on a metered basis, meaning
they are charged for the actual amount of resources they consume,
rather than a flat fee.
4. This model offers several advantages, including scalability,
flexibility, and cost-efficiency. Users can easily scale their
computing resources up or down based on their current needs,
without having to invest in and manage their own infrastructure.
Additionally, users can access these resources from anywhere with
an internet connection, making it particularly attractive for
businesses with dynamic or unpredictable computing needs.
Hardware Virtualization :-
It is the abstraction of computing resources from the software that uses
cloud resources. It involves embedding virtual machine software into the
server's hardware components. That software is called the hypervisor.
The hypervisor manages the shared physical hardware resources between
the guest OS & the host OS. The abstracted hardware is represented as
actual hardware. Virtualization means abstraction & hardware
virtualization is achieved by abstracting the physical hardware part using
Virtual Machine Monitor (VMM) or hypervisor. Hypervisors rely on
command set extensions in the processors to accelerate common
virtualization activities for boosting the performance. The term hardware
virtualization is used when VMM or virtual machine software or any
hypervisor gets directly installed on the hardware system. The primary
task of the hypervisor is to process monitoring, memory & hardware
controlling. After hardware virtualization is done, different operating
systems can be installed, and various applications can run on it. Hardware
virtualization, when done for server platforms, is also called server
virtualization.
 Types of Hardware Virtualization:-
 Full Virtualization: Here the hardware architecture is
completely simulated. Guest software doesn't need any
modification to run any applications.
 Emulation Virtualization: Here the virtual machine
simulates the hardware & is independent. Furthermore,
the guest OS doesn't require any modification.
 Para-Virtualization: Here, the hardware is not simulated;
instead the guest software runs its isolated system.

1.2 Properties and characteristics of a cloud computing :-


1. On-demand self-service:- Means that a consumer can request &
receive access to a service offering, without an administrator or
some sort of support staff having to fulfil the request manually.
2. Broad Network Access:- i.e the services can be accessed from any
location(using any type of device). i.e anywhere access & anytime.
3. Resource Pooling:- (Resource can be storage, memory, network
bandwidth , virtual machines) i.e it can be any service which can
be consumed by cloud users.
I. Resource pooling means that multiple customers are serviced
from the same physical resources.
4. Measured Services:- Pay according to the services you use.
5. Rapid elasticity & scalability:- One of the great things about cloud
computing is the ability to quickly provision resources in the cloud
as the organization need them, (& then to remove them when they
don’t need them)
6. No maintenance / easy maintenance
7. Security:- Copy of our on various servers. If one fails data is safe
on the other.
1.3 Challenges and Risks :-
1. Security :- There is no doubt that Cloud Computing provides
various Advantages but there are also some security issues in
cloud computing. Below are some following Security Issues in
Cloud Computing as follows.
I. Data Loss :- Data loss is one of the issues faced in cloud
computing. This is also known as Data Leakage. As we
know that our sensitive data is in the hands of Somebody
else, and we don’t have full control over our database. So,
if the security of cloud service is to break by hackers then
it may be possible that hackers will get access to our
sensitive data or personal files.
II. Interference of Hackers and Insecure API’s :- As we know,
if we are talking about the cloud and its services it means
we are talking about the Internet. Also, we know that the
easiest way to communicate with Cloud is using API. So it
is important to protect the Interface’s and API’s which are
used by an external user. But also in cloud computing, few
services are available in the public domain which are the
vulnerable part of Cloud Computing because it may be
possible that these services are accessed by some third
parties. So, it may be possible that with the help of these
services hackers can easily hack or harm our data.
III. User Account Hijacking:- Account Hijacking is the most
serious security issue in Cloud Computing. If somehow the
Account of User or an Organization is hijacked by a hacker
then the hacker has full authority to perform Unauthorized
Activities.
IV. Lack of Skill:- While working, shifting to another service
provider, need an extra feature, how to use a feature, etc.
are the main problems caused in IT Companies who
doesn’t have skilled Employees. So it requires a skilled
person to work with Cloud Computing.
2. Privacy:-
I. Data Confidentiality Issues:- Personal data should be made
unreachable to users who do not have proper authorization
to access it and one way of making sure that confidentiality
is by the usage of severe access control policies and
regulations.
II. Data Loss Issues:- Data loss or data theft is one of the major
security challenges that the cloud providers face. If a cloud
vendor has reported data loss or data theft of critical or
sensitive material data in the past, more than sixty percent of
the users would decline to use the cloud services provided
by the vendor.
III. Geographical Data Storage Issues:- Since the cloud
infrastructure is distributed across different geographical
locations spread throughout the world. Moreover,the user
fears that local laws can be violated due to the dynamic
nature of the cloud makes it very difficult to delegate a
specific server that is to be used for trans-border data
transmission.
IV. Transparency Issues:- In cloud computing security,
transparency means the willingness of a cloud service
provider to reveal different details and characteristics on its
security preparedness. Some of these details compromise
policies and regulations on security, privacy, and service
level.
V. Hypervisor Related Issues:- Virtualization means the
logical abstraction of computing resources from physical
restrictions and constraints. But this poses new challenges
for factors like user authentication, accounting, and
authorization. The hypervisor manages multiple Virtual
Machines and therefore becomes the target of adversaries.
Different from the physical devices that are independent of
one another, Virtual Machines in the cloud usually reside in
a single physical device that is managed by the same
hypervisor. The compromise of the hypervisor will hence
put various virtual machines at risk.
3. Trust :-
I. Trust is a complex issue in relation to cloud data
management and therefore an important issue in cloud
Computing. In a Currency note that has an assurance that in
the statement ”guaranteed by the central
government” printed on the top of the note denotes a trust
statement by the government. When the currency is
deposited in the bank it is till governed by the reserve Bank
which guarantees the safety of the deposit so that the
depositor trust is ensured.
II. When the same analogy is applied to Data in cloud , trust
becomes a very big issue once the data from a company
goes into the cloud service providers platform. Who will
guarantee that your data is safe?
III. Trust is a very complex issue as Trust does not seem to have
a unique definition in cloud computing some defining it as
“levels of confidence in something or someone”. Relating
this to the cloud, it can be described as the level of
satisfaction or confidence that a user has in the services that
is been provided by a Cloud Service Provider. Trust is based
on the confidence and assurance that data, cloud platform
processes will provide an expected result with a certain level
of guarantee. Trust is an important aspect for enhancing
security and privacy consideration.
IV. Trust issue with a company’s data with a cloud platform
could be subject to a malicious user to have access to other
user’s credentials. This makes it easy to spy and manipulate
data, also falsify and redirect the legitimate user’s
information to competitor or data loss . Issue of trust deals
with the length of time that an organization's data is kept by
a cloud service provider after deletion by the company or
the cloud service provider also may delete or even modify
an organization's data without adequate backup leading loss
of data permanently. These issues are areas where cloud
service providers are constantly improving their trust by
addressing these issues.
V. Cloud computing allows the cloud service provider to gain
total control of the user’s data and resources leading to trust
and security issues, because the user should have the right to
know where their data is stored and how such data is being
accessed. Although these issues can be handled by adding
precautionary trust mechanism like encryption and strong
authentication procedure, it still should not prevent the users
from knowing and having control over their data. Hence the
need for more transparency on the part of the cloud
providers to ensure trusted governance issues.
VI. Several efforts are ongoing to build trust between the Cloud
Service Providers and cloud users. Various security strategy
and confidence building measures using technology are
evolving regularly to ensure trust in cloud computing.
Pioneering organizations like Cloud Security Cooperation in
the U.S.A. , Government Office for Data Security (BSI) in
Germany ,International Grid Trust Federation (IGTF),and
Security, Trust & Assurance Registry (STAR) that help in
mitigating trust issues.
VII. A reputation-based trust mechanism reflects the overall
view of an industry towards a cloud service provider. It can
help with cloud service provider selection; but is insufficient
for other important purposes. A formal process for
assessment of cloud services and their providers by
independent third parties should be the norm, acceptable to
both cloud users and providers, with independent third-party
cloud assessors as part of the SLA.
VIII. Trust is a social problem, but technological advancement
can improve information credibility, reputation and trust on
cloud services. The aspect of trust would be a key element
in sales team agenda during discussions with prospective
customers. A robust Trust management framework for
multi-cloud environments as a publicly stated policy would
address the issue of trust on the cloud service providers.
4. Data Lock-In and standardization:- It is the situation where
customers are dependent (i.e locked-in) on a single cloud provider
technology, implementation and cannot easily move in the future
to a different vendor without substantial costs, legal constraints, or
technical compabilities
OR
Organization may face problem when transferring their services
from one vendor(data) to another. As different vendors provide
different problem and services, they can cause difficulty in moving
from one cloud vendor(data) to another.
I. Data Transfer Risk:- It is not easy to move data from one
service provider to another. A lot of queries arises:
 Who is responsible for extracting the data from the
databases.
 In what format the data will be?
What will be the format of the new cloud service
provider etc.,
5. Availability:- High availability is the ultimate goal of moving to
the cloud. The idea is to make your products, services, and tools
available to your customers and employees at any time from
anywhere using any device with an internet connection.
 Cloud availability is related to cloud reliability.
 For example, let’s say you have an online store that is
available 24/7. But sometimes clicking the “checkout”
button kicks customers out of the system before they have
completed the purchase. So, your store may be available all
the time, but if the underlying software is not reliable, your
cloud offerings are basically useless.
6. Fault Tolerance:- Fault tolerance In cloud computing is creating a
blueprint for continuous work when some components fail or
become unavailable. It assists businesses in assessing their
infrastructure needs and requirements, as well as providing
services if the relevant equipment becomes unavailable for
whatever reason. The capacity of an operating system to recover
and recognize errors without failing can be managed by hardware,
software, or a mixed approach that uses load balancers. As a result,
fault tolerance solutions are most commonly used for mission-
critical applications or systems.
I. Security Breach Occurrences: Fault tolerance may occur for
a variety of causes, including security failures. The hacking
of the server has a negative influence on the server and
results in a loss of information. Ransomware, phishing,
malware attacks, and other security breaches are further
reasons why fault tolerance is necessary.
II. System Failure: This may indicate a software or hardware
issue. The software failure causes a system crash or hang,
which might be caused by a stack overflow or another issue.
Improper maintenance of the actual hardware equipment
will lead to hardware system failure.
7. Disaster Recovery:- Here are some of the common challenges
and risks in disaster recovery for cloud computing:
I. Network Connectivity:- Cloud computing relies heavily on
network connectivity, and any interruption to the network
can result in downtime and data loss. Organizations need to
ensure that they have redundant and reliable network
connectivity to minimize the risk of interruption.

II. Data Security:- Data security is a critical concern in cloud


computing, and organizations need to ensure that their data is
protected against unauthorized access, theft, and cyber-
attacks. Data encryption, access controls, and regular
security audits can help to mitigate this risk.

III. Service Provider Dependence:- Organizations that rely on


cloud service providers for disaster recovery services may be
at risk of dependence on a single provider. If the provider
experiences an outage or data loss, the organization may
experience significant downtime and data loss.

8. Resource management and Energy Efficiency:-


 Cloud resource management refers to the process of
overseeing and optimizing the use of various resources
within a cloud environment. These resources can include
computing power (such as virtual machines and containers),
storage (such as databases and object storage), networking
(such as load balancers and virtual networks), and other
services and tools offered by cloud service providers.
 Its primary goals are to enhance performance, ensure
security, and maintain cost-effectiveness. This involves
monitoring and auditing resource usage, automating resource
provisioning and decommissioning, and adjusting resource
allocations according to demand.
 Challenges :-
I. Cost optimization issue:- The pay-as-you-go model of
cloud services can lead to overspending if resources are
not efficiently utilized. Without proper monitoring and
optimization, businesses may pay for unused or
underutilized resources, resulting in significant cost
implications.
II. Resource Allocation Concerns:- Effective resource
allocation is key to maximizing the efficiency of cloud
resources. Businesses must strategically allocate
resources based on workload demands and priorities.
Utilizing workload management tools and resource
tagging can optimize allocation and enhance resource
utilization.

1.4 Advantages of cloud computing in Machine Learning:-


The cloud is the future of data science. It’s where machine
learning will be done, and it’s where a lot of big data
analysis will happen. And that means a lot of benefits for
companies.
I. It enables experimenting and testing multiple:-
models For one thing, the cloud allows you to scale
your machine learning projects up and down as
needed. You can start with a small set of data points
and add more as you get more confident in your
predictions.
II. It’s inexpensive:-
When using machine learning in the cloud, you’re
only paying for your consumption, which works
wonders for scalability. Whether you’re just
personally experimenting or servicing millions of
customers, you can scale to any needs, and only pay
for what you use.
III. It needs less technical knowledge:-
Popular cloud services like AWS, Microsoft Azure,
and Google Cloud Platform in fact offer machine
learning options that don’t require deep knowledge of
AI, machine learning theory, or a large team of data
scientists.
With the cloud, AI can be deployed in a matter of
minutes. It also scales automatically, so you don’t have
to worry about the technical complexity of
provisioning resources or managing infrastructure.
IV. Easy Integration:-
Most popular cloud services also provide SDKs
(software developer kits) and APIs. This allows you to
embed machine learning functionality directly into
applications. They also support most programming
languages. With the cloud, you can integrate machine
learning into your workflows quickly and easily. In the
past, machine learning models were difficult to
integrate into existing applications.
IV. It reduces time-to-value:-
Another important aspect of the cloud is that it reduces the
time-to-value. Time-to-value is the amount of time it takes
from when you start a project to when you see results from
it.
In traditional machine learning deployments, this process
can take months or even years. With the cloud, you can start
seeing results in hours or days. That’s because you don’t
have to provision resources, manage infrastructure, or write
code. You can simply upload your data and start building
models.
V. Access to more data:-
Data is the lifeblood of machine learning. The more data
you have, the better your models will be. And the cloud
provides access to more data than ever before.
For example, if you’re building a predictive model for
customer churn, you can access historical customer data
that’s stored in the cloud. This data can be used to train your
machine learning model so that it can make better
predictions.
VI. Security and privacy:-
When done right, machine learning in the cloud is secure
and private. That’s because the data is stored in the cloud
provider’s secure data center.
The cloud provider is responsible for the security of the data
center and the data that’s stored there. This means that you
don’t have to worry about building your own security
infrastructure.
In addition, most cloud providers offer additional security
features, such as encryption, to further protect your data.

VII. It frees up resources:-


Machine learning in the cloud frees up resources so that you
can focus on other things. For example, if you’re building a
machine learning model to predict demand for a new
product, you can use the cloud to train and deploy the
model. This frees up your time so that you can focus on
other things, such as marketing the product.
When done right, machine learning in the cloud provides a
number of benefits that are difficult or impossible to achieve
with traditional machine learning. These benefits include
reduced time-to-value, easier integration, and increased
security and privacy.

You might also like