Cloud Computing Unit 5
Cloud Computing Unit 5
Cloud Computing Unit 5
Security in Clouds
Cloud security challenges
Software as a Service Security
Common Standards
The Open Cloud Consortium
The Distributed Management Task Force
Standards for Application Developers
Standards for Messaging
Standards for Security
End user Access to Cloud Computing
Mobile Internet devices and the Cloud
Hadoop, MapReduce, Virtual Box, Google App Engine
Programming Environment for Google App Engine
Security in Clouds
Cloud Security, also known as cloud computing security, consists of a set of policies, controls, procedures
and technologies that work together to protect cloud-based systems, data, and infrastructure.
These security measures are configured to protect cloud data, support regulatory compliance and
protect customers' privacy as well as setting authentication rules for individual users and devices.
From authenticating access to filtering traffic, cloud security can be configured to the exact needs of the
business. And because these rules can be configured and managed in one place, administration
overheads are reduced and IT teams empowered to focus on other areas of the business.
The way cloud security is delivered will depend on the individual cloud provider or the cloud security
solutions in place. However, implementation of cloud security processes should be a joint responsibility
between the business owner and solution provider.
For businesses making the transition to the cloud, robust cloud security is imperative. Security threats are
constantly evolving and becoming more sophisticated, and cloud computing is no less at risk than an on-
premise environment. For this reason, it is essential to work with a cloud provider that offers best-in-class
security that has been customized for your infrastructure.
Benefits of Cloud Security
1.Centralized security: Just as cloud computing centralizes applications and data, cloud
security centralizes protection. Cloud-based business networks consist of numerous
devices and endpoints that can be difficult to manage when dealing with shadow IT or
BYOD. Managing these entities centrally enhances traffic analysis and web filtering,
streamlines the monitoring of network events and results in fewer software and policy
updates. Disaster recovery plans can also be implemented and actioned easily when they
are managed in one place.
2.Reduced costs: One of the benefits of utilizing cloud storage and security is that it
eliminates the need to invest in dedicated hardware. Not only does this reduce capital
expenditure, but it also reduces administrative overheads. Where once IT teams were
firefighting security issues reactively, cloud security delivers proactive security features
that offer protection 24/7 with little or no human intervention.
4.Reliability: Cloud computing services offer the ultimate in dependability. With the
right cloud security measures in place, users can safely access data and applications
within the cloud no matter where they are or what device they are using.
Software as a Service Security
SaaS security is cloud-based security designed to protect the data that software as
service applications carry.
It’s a set of practices that companies that store data in the cloud put in place to
protect sensitive information pertaining to their customers and the business itself.
However, SaaS security is not the sole responsibility of the organization using the
cloud service. In fact, the service customer and the service provider share the
obligation to adhere to SaaS security guidelines published by the National Cyber
Security Center (NCSC).
SaaS security is also an important part of SaaS management that aims to reduce
unused licenses, shadow IT and decrease security risks by creating as much visibility
as possible.
6 SaaS Security best practices
One of the main benefits that SaaS has to offer is that the respective applications are on-
demand, scalable, and very fast to implement, saving companies valuable resources and
time. On top of that, the SaaS provider typically handles updates and takes care of software
This flexibility and the fairly open access have created new security risks that SaaS security
best practices are trying to address and mitigate. Below are 6 security practices and
solutions that every cloud-operating business should know about.
1. Enhanced Authentication
Offering a cloud-based service to your customers means that there has to be a way for them
to access the software. Usually, this access is regulated through login credentials. That’s
why knowing how your users access the resource and how the third-party software provider
handles the authentication process is a great starting point.
Once you understand the various methods, you can make better SaaS security decisions and
enable additional security features like multifactor authentication or integrate other enhanced
authentication methods.
2. Data Encryption
The majority of channels that SaaS applications use to communicate employ TLS (Transport Layer Security)
to protect data that is in transit. However, data that is at rest can be just as vulnerable to cyber attacks as
data that is being exchanged. That’s why more and more SaaS providers offer encryption capabilities that
protect data in transit and at rest. It’s a good idea to talk to your provider and check whether enhanced data
encryption is available for all the SaaS services you use.
5. Consider CASBs
It is possible that the SaaS provider that you are choosing is not able to provide the level of SaaS security that your
company requires. If there are no viable alternatives when it comes to the vendor, consider cloud access security
broker (CASB) tool options. This allows your company to add a layer of additional security controls that are not native
to your SaaS application. When selecting a CASB –whether proxy or API-based –make sure it fits into your existing IT
The most well-known standard in information security and compliance is ISO 27001,
developed by the International Organization for Standardization.
The ISO 27001 standard was created to assist enterprises in protecting sensitive data by
best practices.
Cloud compliance is the principle that cloud-delivered systems must be compliant with
the standards their customers require. Cloud compliance ensures that cloud computing
services meet compliance requirements.
OCC manages and operates resources including the Open Science Data Cloud (aka OSDC), which is a
multi-petabyte scientific data sharing resource.
The consortium is based in Chicago, Illinois, and is managed by the 501(c)3 Center for Computational Science
3.The Open Cloud Testbed - This working group manages and operates the Open Cloud Testbed. The
Open Cloud Testbed (OCT) is a geographically distributed cloud testbed spanning four data centers and
connected with 10G and 100G network connections. The OCT is used to develop new cloud computing
software and infrastructure.
4.The Biomedical Data Commons - The Biomedical Data Commons (BDC) is cloud-based infrastructure that
provides secure, compliant cloud services for managing and analyzing genomic data, electronic medical records
(EMR), medical images, and other PHI data. It provides resources to researchers so that they can more easily make
discoveries from large complex controlled access datasets. The BDC provides resources to those institutions in the
BDC Working Group. It is an example of what is sometimes called condominium model of sharing research
infrastructure in which the research infrastructure is operated by a consortium of educational and research
organizations and provides resources to the consortium.
5. NOAA Data Alliance Working Group - The OCC National Oceanographic and Atmospheric
Administration (NOAA) Data Alliance Working Group supports and manages the NOAA data
commons and the surrounding community interested in the open redistribution of NOAA
In 2015, the OCC was accepted into the Matter healthcare community at Chicago's historic
Merchandise Mart. Matter is a community healthcare entrepreneurs and industry leaders
working together in a shared space to individually and collectively fuel the future of
healthcare innovation.
In 2015, the OCC announced a collaboration with the National Oceanic and Atmospheric
Administration (NOAA) to help release their vast stores of environmental data to the general
public. This effort is managed by the OCC's NOAA data alliance working group.
The Distributed management Task Force (DMTF)
DMTF is a 501(c)(6) nonprofit industry standards organization that creates open manageability standards spanning diverse emerging
and traditional IT infrastructures including cloud, virtualization, network, servers and storage. Member companies and alliance
partners collaborate on standards to improve interoperable management of information technologies.
Based in Portland, Oregon, the DMTF is led by a board of directors representing technology companies including: Broadcom Inc., Cisco,
Dell Technologies, Hewlett Packard Enterprise, Intel Corporation, Lenovo, NetApp, Positive Tecnologia S.A., and Verizon.
Founded in 1992 as the Desktop Management Task Force, the organization’s first standard was the now-legacy Desktop Management
Interface (DMI). As the organization evolved to address distributed management through additional standards, such as the Common
Information Model (CIM), it changed its name to the Distributed Management Task Force in 1999 , but is now known as, DMTF.
The DMTF continues to address converged, hybrid IT and the Software Defined Data Center (SDDC)
with its latest specifications, such as the CADF (Cloud Auditing Data Federation), CIMI (Cloud Infrastructure Management Interface), CIM
(Common Information Model), DASH (Desktop and Mobile Architecture for System Hardware), MCTP (Management
Component Transport Protocol), NC-SI (Network Controller Sideband Interface), OVF (Open Virtualization Format), PLDM (Platform
Level Data Model), Redfish Device Enablement (RDE), Redfish (Including Protocols, Schema, Host Interface, Profiles) SMASH (Systems Management
Architecture for Server Hardware) and SMBIOS (System Management BIOS).
The Distributed Management Task Force
DMTF enables more effective management of millions of IT systems
worldwide by bringing the IT industry together to collaborate on
the development, validation and promotion of systems
management standards.
The group spans the industry with 160 member companies and
organizations, and more than 4,000 active participants crossing
43 countries.
The DMTF board of directors is led by 16 innovative, industry-
leading technology companies.
The Distributed Management Task Force
DMTF management standards are critical to enabling
interoperability among multi vendor systems, tools management
enterprise. and solutions within
The DMTF started the Virtualization Management Initiative (VMAN).
The Open Virtualization Format (OVF) is a fairly new standard that has
within the VMAN Initiative.
Benefits of VMAN are Lowering the IT learning curve, and Lowering complexity
for vendors implementing their solutions
Standardized Approaches available to
Companies due to VMAN Initiative
Deploy virtual computer systems
Discover and take inventory of virtual computer
Manage the life cycle of virtual computer systems
Add/change/delete virtual resources
Monitor virtual systems for health and performance
Standards for Application Developers
The purpose of application development standards is to
uniform, consistent, high-quality software solutions.
An Ajax framework helps developers to build dynamic web pages on the client
side. Data is sent to or from the server using requests, usually written in
The acronym derives from the fact that it includes Linux, Apache,
MySQL, and PHP (or Perl or Python) and is considered by many to be
the platform of choice for development and deployment of high-
performance web applications which require a solid and reliable
The Post Office Protocol (POP) was introduced to circumvent this situation.
Once the client connects, POP servers begin to download the messages and subsequently
delete them from the server (a default setting) in order to make room for more messages.
Internet Messaging Access Protocol
Once mail messages are downloaded with POP, they are automatically deleted
from the server when the download process has finished.
To get around these problems, a standard called Internet Messaging Access Protocol
was created. IMAP allows messages to be kept on the server but viewed and
manipulated (usually via a browser) as though they were stored locally.
Standards for Security
Security standards define the processes, procedures, and practices
necessary for implementing a secure environment that provides
privacy and security of confidential information in a cloud
Security protocols, used in the cloud are:
Security Assertion Markup Language (SAML)
Open Authentication (Oauth)
Security Assertion Markup Language (SAML)
SAML is an XML-based standard for communicating authentication, authorization,
and attribute information among online partners. It allows businesses to securely
send assertions between partner organizations regarding the identity and
entitlements of a principal.
SAML allows a user to log on once for affiliated but separate Web sites.
SAML is designed for business-to-business (B2B) and business-to-consumer
(B2C) transactions.
SAML is built on a number of existing standards, namely, SOAP, HTTP, and
XML. SAML relies on HTTP as its communications protocol and specifies the
use of SOAP.
Most SAML transactions are expressed in a standardized form of XML.
assertions and protocols are specified using XML schema.
Open Authentication (Oauth)
OAuth is an open protocol, initiated by Blaine Cook and Chris Messina,
to allow secure API authorization in a simple, standardized method for
various types of web applications.
OAuth is a method for publishing and interacting with protected
OAuth provides users access to their data while protecting
account credentials.
OAuth by itself provides no privacy at all and depends on other protocols
such as SSL to accomplish that.
OpenID is an open, decentralized standard for user authentication and access
control that allows users to log onto many services using the same digital
It is a single-sign-on (SSO) method of access control.
It replaces the common log-in process (i.e., a log-in name and a password)
by allowing users to log in once and gain access to resources across
participating systems.
An OpenID is in the form of a unique URL and is authenticated by the
entity hosting the OpenID URL.
Transport Layer Security (TLS) and its predecessor, Secure Sockets Layer (SSL), are
cryptographically secure protocols designed to provide security and data integrity for
communications over TCP/IP
TLS and SSL encrypt the segments of network connections at the transport layer.
TLS provides endpoint authentication and data confidentiality by using
TLS involves three basic phases:
Peer negotiation for algorithm support
Key exchange and authentication
Symmetric cipher encryption and message authentication
End user Access to Cloud Computing
In its most strict sense, end-user computing (EUC) refers to computer systems and
platforms that help non-programmers create applications. ... What's important is that
a well-designed EUC/VDI plan can allow users to access the digital platforms they need
to be productive, both on-premises and working remotely in the cloud.
An End-User Computing application or EUC is any application that is not managed and
developed in an environment that employs robust IT general controls. ... Although
the most pervasive EUCs are spreadsheets, EUCs also can include user databases,
queries, scripts, or output from various reporting tools.
Broadly, end-user computing covers a wide range of user-facing resources, such as:
desktop and notebook end user computers; desktop operating systems and
applications; wearables and smartphones; cloud, mobile, and web applications; and
virtual desktops and applications.
Mobile Internet devices and the Cloud
Mobile cloud computing uses cloud computing to deliver applications to mobile devices. These
mobile apps can be deployed remotely using speed and flexibility and development tools.
Mobile cloud storage is a form of cloud storage that is accessible on mobile devices such as
laptops, tablets, and smartphones. Mobile cloud storage providers offer services that allow the
user to create and organize files, folders, music, and photos, similar to other cloud computing
The mobile cloud is Internet-based data, applications and related services accessed through
smartphones, laptop computers, tablets and other portable devices. Mobile cloud computing
is differentiated from mobile computing in general because the devices run cloud- based Web
apps rather than native apps.
Locator apps and remote backup are two types of cloud-enabled services for mobile devices
A mobile cloud app is a software program designed to be accessible via the internet through
portable devices. In terms of the real world, there are many examples of mobile cloud
solutions, including: Email.
Hadoop (
It is a collection of open-source software utilities that facilitates using a network of many
computers to solve problems involving massive amounts of data and computation.
It provides a software framework for distributed storage and processing of big data using
the MapReduce programming model.
Hadoop was originally designed for computer clusters built from commodity hardware, which
is still the common use. It has since also found use on clusters of higher-end hardware.
All the modules in Hadoop are designed with a fundamental assumption that hardware
failures are common occurrences and should be automatically handled by the framework.
The core of Apache Hadoop consists of a storage part, known as Hadoop Distributed File
System (HDFS), and a processing part which is a MapReduce programming model.
Hadoop splits files into large blocks and distributes them across nodes in a cluster. It then
transfers packaged code into nodes to process the data in parallel. This approach takes
advantage of data locality, where nodes manipulate the data they have access to.
This allows the dataset to be processed faster and more efficiently than it would be in a more
conventional supercomputer architecture that relies on a parallel file system where
computation and data are distributed via high-speed networking.
The base Apache Hadoop framework is composed of the following modules:
Hadoop Common – contains libraries and utilities needed by other Hadoop modules;
Hadoop Distributed File System (HDFS) – a distributed file-system that stores data on commodity
machines, providing very high aggregate bandwidth across the cluster;
Hadoop YARN – (introduced in 2012) a platform responsible for managing computing resources in
clusters and using them for scheduling users' applications;[10][11]
Hadoop MapReduce – an implementation of the MapReduce programming model for large-scale
data processing.
Hadoop Ozone – (introduced in 2020) An object store for Hadoop
The term Hadoop is often used for both base modules and sub-modules and also the ecosystem, or
collection of additional software packages that can be installed on top of or alongside Hadoop, such as
Apache Pig, Apache Hive, Apache HBase, Apache Phoenix, Apache Spark, Apache ZooKeeper,
Cloudera Impala, Apache Flume, Apache Sqoop, Apache Oozie, and Apache Storm.
Apache Hadoop's MapReduce and HDFS components were inspired by Google papers on MapReduce
and Google File System.
The Hadoop framework itself is mostly written in the Java programming language, with some native code
in C and command line utilities written as shell scripts. Though MapReduce Java code is common, any
programming language can be used with Hadoop Streaming to implement the map and reduce parts of the
user's program.[15] Other projects in the Hadoop ecosystem expose richer user in
MapReduce is a programming model or pattern within the Hadoop framework that is used to
access big data stored in the Hadoop File System (HDFS). ... MapReduce facilitates concurrent
processing by splitting petabytes of data into smaller chunks, and processing them in parallel on
Hadoop commodity servers.
MapReduce is a programming model for processing large amounts of data in a parallel and
distributed fashion. It is useful for large, long-running jobs that cannot be handled within the scope of
a single request, tasks like:
Analyzing application logs
Aggregating related data from external sources
Transforming data from one format to another
Exporting data for external analysis
App Engine MapReduce is a community-maintained, open source library that is built on top of
App Engine services, including Datastore and Task Queues. The library is available on GitHub at
these locations:
Java source project.
Python source project.
MapReduce is a software framework for easily writing
applications which process vast amounts of data (multi-terabyte
data-sets) in-parallel on large clusters (thousands of nodes) of
commodity hardware in a reliable, fault-tolerant manner.
A MapReduce job usually splits the input data-set into
independent chunks which are processed by the map tasks in a
completely parallel manner.
The framework sorts the outputs of the maps, which are then
input to the reduce tasks.
Typically both the input and the output of the job are stored in a
The framework takes care of scheduling tasks, monitoring them
and re-executes the failed tasks.
Typically the compute nodes and the storage nodes are the same, that is, the
MapReduce framework and the Hadoop Distributed File System are running on
the same set of nodes. This configuration allows the framework to effectively
schedule tasks on the nodes where data is already present, resulting in very
high aggregate bandwidth across the cluster.
The MapReduce framework consists of a single master JobTracker and one
slave TaskTracker per cluster-node. The master is responsible for scheduling
the jobs' component tasks on the slaves, monitoring them and re-executing
the failed tasks. The slaves execute the tasks as directed by the master.
Minimally, applications specify the input/output locations and supply map and
reduce functions via implementations of appropriate interfaces and/or abstract-
classes. These, and other job parameters, comprise the job configuration.
The Hadoop job client then submits the job (jar/executable etc.) and
configuration to the JobTracker which then assumes the responsibility of
distributing the software/configuration to the slaves, scheduling tasks and
monitoring them, providing status and diagnostic information to the job-client.
VirtualBox is a general-purpose Type-2 Hypervisor virtualization tool for x86 and x86-
64 hardware developed by Oracle Corp., targeted at server, desktop, and embedded use,
that allows users and administrators to easily run multiple guest operating systems on a
single host.
VirtualBox was originally created by Innotek GmbH, which was acquired by Sun
Microsystems in 2008, which was in turn acquired by Oracle in 2010.
VirtualBox may be installed on Microsoft Windows, MacOS, Linux, Solaris and
OpenSolaris. There are also ports to FreeBSD and Genode.
It supports the creation and management of guest virtual machines running Windows,
Linux, BSD, OS/2, Solaris, Haiku, and OSx86, as well as limited virtualization of macOS
guests on Apple hardware. For some guest operating systems, a "Guest Additions"
package of device drivers and system applications is available, which typically improves
performance, especially that of graphics, and allows changing the resolution of the guest
OS automatically when the window of the virtual machine on the host OS is resized.
Google App Engine
Google App Engine (often referred to as GAE or simply App Engine) is a cloud computing
platform as a service for developing and hosting web applications in Google-managed
data centers. Applications are sandboxed and run across multiple servers.
Google App Engine, which is a platform-as-a-service (PaaS) offering that gives software
developers access to Google's scalable hosting.
An App Engine web application can be described as having three major parts:
Application instances Scalable data storage
Scalable services
Programming Environment for Google App Engine
Google App Engine (often referred to as GAE or simply App Engine) is a cloud computing platform as
a service for developing and hosting web applications in Google- managed data centers.
Applications are sandboxed and run across multiple servers. App Engine offers automatic scaling for
web applications—as the number of requests increases for an application, App Engine automatically
allocates more resources for the web application to handle the additional demand.
Google App Engine primarily supports Go, PHP, Java, Python, Node.js, .NET, and Ruby applications,
although it can also support other languages via "custom runtimes". The service is free up to a
certain level of consumed resources and only in standard environment but not in flexible
environment. Fees are charged for additional storage, bandwidth, or instance hours required by the
application. It was first released as a preview version in April 2008 and came out of preview in
September 2011.
The environment you choose depends on the language and related technologies you want to use for
developing the application.
Runtimes and framework
Google App Engine primarily supports Go, PHP, Java, Python, Node.js, .NET,
and Ruby applications, although it can also support other languages via "custom runtimes".
Python web frameworks that run on Google App Engine include Django, CherryPy, Pyramid,
Flask, web2py and webapp2, as well as a custom Google-written webapp framework and
several others designed specifically for the platform that emerged since the release.
Any Python framework that supports the WSGI using the CGI adapter can be used to create
an application; the framework can be uploaded with the developed application. Third-party
libraries written in pure Python may also be uploaded.
Google App Engine supports many Java standards and frameworks. Core to this is
the servlet 2.5 technology using the open-source Jetty Web Server, along with
accompanying technologies such as JSP. JavaServer Faces operates with some
workarounds. A newer release of App Engine Standard Java in Beta supports Java8, Servlet
3.1 and Jetty9.
Though the integrated database, Google Cloud Datastore, may be unfamiliar
to programmers, it is accessed and supported with JPA, JDO, and by the
simple low-level API.
There are several alternative libraries and frameworks you can use to model
and map the data to the database such as Objectify, Slim3 and Jello
The Spring Framework works with GAE. However, the Spring Security module
(if used) requires workarounds. Apache Struts 1 is supported, and Struts 2
runs with workarounds.
The Django web framework and applications running on it can be used on App
Engine with modification.
Django-nonrelaims to allow Django to work with non-relational databases and
the project includes support for App Engine.