Aws S3: A Quick Introduction To Amazon S3

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 15

AWS S3

Table of Content for AWS S3 Tutorial

 What is Amazon S3?


o How is S3 Secure?
o How is S3 Durable?
o Data Storage in S3
o Storage Classes in S3
o Data Organization in S3
o Amazon S3 with Example
 Amazon Benefits or Usage
o Compliance Capability
o Flexibility Management
o Durability, Scalability, and Availability
o Systematic Work Approach
o Data Transfer Flexibility
o Large Data Storage
o Data archiving
o Backup & Recovery
 Cross-region replication in Amazon S3
 S3 Pricing Structure
 Amazon S3 Use Cases
 Object lifecycle management in Amazon S3
 Encryption in Amazon S3

To know more what should be included in the AWS course to learn, check out the
syllabus details first.

A Quick Introduction to Amazon S3


Amazon S3 (Simple Storage Service) is designed for the internet to enable large-
capacity, low-cost storage provision across different geographical areas. It
provides IT team and developers with durable, flexible, and durable, object storage.

Topics to be covered in the introductory section:

 How is S3 Secure?
 How is S3 Durable?
 Data Storage in S3
 Storage Classes in S3
 Data Organization in S3
 Amazon S3 with Example

Let us discuss each of the sub-sections one by one below:

How is S3 Secure?

 Data encryption for the stored data can happen in two ways:
o Client-side Encryption
o Server-side Encryption
 Multiple copies of data are maintained to enable the regeneration of data in
case of data corruption.
 Also, versioning is performed wherein each edit is archived for potential
retrieval.

How is S3 Durable?

 It verifies the integrity of stored data using a checksum. If S3 detects any


corruption in data, it can be quickly repaired by using the data replication
process.
 Every time data is stored or retrieved; it checks the incoming network
traffic for corrupted data packets.
 Also, S3 is highly scalable in nature where storage requirements can be
scaled automatically based on requirements, and you have to pay for what
you are actually using currently.

After this discussion, the next question that hits your mind directly is what
kind of data can be stored in AWS S3?

Virtually, almost any type of data can be stored with S3 in different formats. If we
are discussing the capacity, the number of objects, and volume that we can store
with S3 are unlimited. An object is considered the basic storage entity in S3 that
consists of a key, data, and the metadata. So, Data can be divided into two
categories further:

 Data that has to be accessed frequently.


 Data not to be accessed frequently.

Geographical Data Storage in Amazon S3


You can decide where your data should be stored geographically. Although making
a wise decision for the region is necessary, yet it needs to be planned well. Four
parameters could help you in deciding the optimal region for the storage. These
are – Price, customer location, Latency, and service availability, etc.

Storage Classes in Amazon S3


Therefore, Amazon comes up with three storage classes to deliver the best
experience to its customers at very affordable rates.

1. Amazon S3 Standard for frequent data access: This standard is


suitable for performance-sensitive use cases where the latency is kept low.
For example, if we talk about a healthcare organization, then frequently
accessed data will be of admitted patients that should be retrieved quickly.
2. Amazon S3 Standard for less frequent data access: It is suitable for
use cases where data is accessed less frequent and long-lived. In the
same hospital, data for discharged patients is not required on a daily basis,
but you may have to access it when the patient returns with any
complication. In that case, the discharge summary for those patients
should be retrieved quickly.
3. Amazon Glacier: It is suitable for use cases when data has to be
archived, but high performance is not needed in those cases. It costs lower
as compared to the other two services. In a hospital, blood report, X-ray
report, or MRI scan docs that are one year older, not required by
healthcare organization anymore.

Data Organization in Amazon S3


Data in Amazon S3 is organized into buckets. A bucket is a logical storage unit in
S3. It contains objects that contain the metadata. Before adding any data to
Amazon S3, the user should create a bucket first that can be used to store objects
ahead.

Moving ahead, let us try to understand the Amazon Simple Storage Service with
the help of an example below.

Amazon S3 with Example


Take an example of the Company that has to launch its storage instances to host
a website for its customers in Indian and USA. To deliver the best customer
experiences, the Company has to choose one region that suits the best its
requirements.

Regions Mumbai/N Virginia


Instance Type (Reserved Instance) e.g amazon ec2-m4.4xlarge 16(vCP

Mumbai -$691/monthly - $0.9 Hourly


Pricing ( 1 Year)
$480/Monthly - $0.6 Hourly

Latency Form USA to India –Low From India

Based on the above parameters, it is pretty clear that N Virginia is just a suitable
region for the Company to host its website because of the low latency and the low
price. Irrespective of your location, you may bid on other regions too that suits your
requirements the best because there is always flexibility to access S3 buckets from
anywhere.

Other than this, there are chances when the website is launched in another region,
and backup is taken in another availability region. This feature is added recently to
the Amazon S3 system and pretty much easy to use. To evaluate your skills, how
much you know about the AWS platform and its technologies, take a quick quiz
first.

Amazon S3 Benefits and Highlights


Amazon S3 has a set of benefits, and a few popular ones are stated below:

 Compliance Capability
 Flexibility Management
 Durability, Scalability, and Availability
 Systematic Work Approach
 Data Transfer Flexibility
 Large Data Storage
 Data archiving
 Backup & Recovery

Let us start with a brief introduction to each of the features one by one:

Read: Top 70 AWS Interview Questions and Answers

1). Compliance Capability


Amazon S3 supports three types of encryption. When it comes to auditing, S3
provides a secure integration with AWS Cloud trail to retain or monitor storage API
call activities. Amazon Macie is an AWS platform that uses machine learning to
arrange, secure, and discover the Aws data automatically. It also supports robust
security standards and compliance certifications too. It helps the customer in
satisfying compliance requirements for every regulatory agency virtually
worldwide.

2). Systematic Work Approach


AWS S3 allows users to run big data analytics on a particular system without
moving it to any other system. It allows users to retrieve the data back that was
needed by S3 objects earlier. It allows users to convert a vast amount of
unstructured data to a structured format with the help of SQL database
programming language.

3). Flexibility Management


With the help of storage administrators, data can be arranged or visualized
quickly. It helps in monitoring data sources and reduces the overall costs while
improving the services. When Amazon S3 is used along AWS Lambda, it helps
customers to log activities and many more functions without focusing on the
infrastructure.

4). Durability, Scalability, and Availability


Amazon S3 is considered as one of the most durable storage services as it works
on the global cloud infrastructure. Data has to pass through three physically
available zones that are far away from the AWS region. It is available at most of
the places and offers an effective way to design and operate databases or apps.

5). Data Transfer Flexibility


Amazon S3 uses plenty of ways for data transfer within the Amazon S3. It is
possible with the help of an API that transfers data through the internet. There is
one direct connection to transfer data within S3 that helps in data transfer to public
or private networks. Aws snowball offers a data transfer system at the petabyte-
level. There is one AWS Storage gateway that offers on-premises storage and
helps in transferring data directly to the cloud through the premises of the user.

6). Data Archiving


Amazon S3 offers the facility to customers where data can be accessed quickly
based on the need for compliance archives. To meet the system retention, Amazon
Glacier provides “Write Once Read Much” Storage. There are lifecycle policies that
make data transitioning between Amazon Glacier and Amazon S3 simpler and
faster.

7). Large Data Storage and Analytics


The Company uses a large amount of data that can be stored in different formats
in S3 buckets, and it can be used as a big data lake too for big data analytics.
Amazon S3 provides us with many services that help us to manage voluminous
data by reducing its costs and optimizing the overall speed of innovation.
8). Backup and recovery
As we have discussed already, S3 is highly durable and secure for data backup
and data archiving. Also, it provides different S3 storage classes too that helps in
optimizing data access and its performance while recovery time objectives.

Cross-region Replication in Amazon S3


Cross-region replication helps users either replicate or transfer the data to some
other location without any hurdles. Obviously, it demands certain costs for its
services that we will discuss in later sections. When compared to traditional data
transfer schemes available over the web, AWS supports two popular mechanisms
for data transfer.

These are Snowball and Transfer Acceleration. Transfer Acceleration helps in


easy, fast, and secure data transfers over long distances by exploiting CloudFront
technologies. CloudFront is a cache service by AWS where the data from client-
side is transferred to the nearest edge location and further data is routed to the S3
bucket over an optimized network path.

Next popular data transfer scheme is Snowball that suggests the interesting idea
of transferring data physically. Here, Amazon sends equipment to premises where
you are free to load the data. It has a kindle attached to it with client’s address
including when it was shipped from Amazon. When the data transfer process is
complete on Snowball, Kindle will change the shipping address back to AWS
headquarters where the Snowball has to be sent.

If you have large batches of data to move, then Snowball is just the perfect choice
to consider. The average turnaround for snowball is five to seven days. In the same
way, Transfer Acceleration can move up to 75 TB of data on a dedicated 1 Gbps
line. It completely depends on the use case and you could take the decision
accordingly. Moving ahead. Let us discuss the overall S3 pricing and how much it
can cost you?

AWS S3 Pricing Structure


“Isn’t anything free on AWS?

If you are a beginner, then you can start with AWS S3 for free. Once signed up,
new users get 5GB of S# standard storage, 20K Get-Requests, and 2K Put-
Requests, 15GB of total data transfer each month for approximately one year. If
you want to exceed this limit, then there is a certain amount that Amazon charges
you. For this purpose, continue reading this blog ahead.

S3 has plenty of features; still, it is affordable and flexible when it comes to the
payment. It allows you to pay for services that you are actually using. The below
table will give you a better idea of S3 pricing for a specific region.
Here is a quick idea of how cross-region replication is billed:
If you replicate 1K objects, then you have to put requests to store 1000 objects and
inter-region data transfer. Once data replication is complete, the 1000 GB will incur
charges based on the destination region.

There are two variants of Snowball

Read: AWS Tutorial - A Detailed Guide to Cloud Computing for Beginner

 Snowball 50 TB: 200$


 Snowball 80 TB: 250$

These are the fixed service charges that you have to pay. Apart from this, you can
check the website, and mostly charges are given exclusive of shipping days,
shipping days are free. Transfer Acceleration pricing is shown in the following table:

Here, AWS S3 charges are quite manageable and affordable when compared to
its benefits. You just have to understand which package suits you the most as per
your requirements.

Amazon S3 Use Cases


Case 1 – Industry Type – Media

Let us understand the concept with the help of a real-time example to assimilate
what we have learned so far. IMDB is a popular internet Movie database to store
details about movies, TV programs, video games, etc. Let us see how did they
exploit the AWS Services to implement this movie database. To get the lowest
possible latency, all possible outputs are calculated in advance with a document
for every combination of letters in search. Each document is further pushed to the
Amazon S3 and thereby to Amazon Cloud Front, and putting documents physically
closer to users. The theoretical number of possible searches that can be calculated
in mind-boggling. For example, a 20-character search has 23 x 1030 combinations.
In practices, by using the IMDB authority on movie and celebrity data, the search
space can be reduced up to 150,000 docs and can be distributed in just a few
hours.

Case 2 – Learn to host a static website on Amazon S3

Let us first learn, what is a static website? Have you any idea about it?

A website that only supports HTML, CSS, or JavaScript but server-side scripting is
not supported. So, if you are looking to host a PHP app or Rails app, then you
should opt for any other option. Moving ahead, let us dive deep to the wonderful
world of hosting a website on AWS S3.

1. Here, you have to create a bucket in the first step, navigate to the S3 in the
AWS Management Console and hot the option “Create Bucket.” Here, you
will be asked to give a bucket name and a region. If you want to add some
subdomain, you can use it for the bucket name. For the region name, pick
something nearest to you and hit “Create.” With a matter of instance, you
can see that your bucket starts appearing in the console.
2. Once the bucket is created successfully, it is the right time to verify a
Bucket. Under “All Buckets” option, search for the bucket name you have
created recently. If it appears in the list, it means the first step is completed
successfully.
3. In the next option, you should enable the Static Website Hosting. The only
thing you have to do is, select the option from Properties panel on the
right. The static website can be hosted entirely on Amazon S3. As soon as
your bucket is enabled for the static website hosting, all the content is
accessible through web browsers via Amazon S3 website endpoint for
your bucket.

4. In this step, you have to create an HTML file. Make sure that the index
document is set to index.html. It is possible to set up an error page
according to requirement. When you are done, just hit the “Save” option.
One more nice thing about Aws Management Console is that you could
upload files to your bucket right from the browser. Let us start by creating
the index.htm page. Here is the HTML code for the same.

5. <!doctype html>
6.
7. <html>
8.
9. <head>
10.
11. <title>
12.
13. Hello, S3!
14.
15. </title>
16.
17. <meta name="description" content="My first S3 website">
18.
19. <meta charset="utf-8">
20.
21. </head>
22.
23. <body>
24.
25. <h2>My first S3 website</h2>
26.
27. <p>I can't believe it was that easy!</p>
28.
29. </body>
30.
31. </html>

32. Now, we will learn how to upload a file in a bucket. To upload a file, you
should select the new bucket option. However, it cannot be seen through
the browser, yet everything in AWS S3 is private.

33. As we just discussed, everything in AWS S3 is private by default. So, you


must be thinking about how to make it public to display on the browser. For
this purpose, just right-click the index.htm file and select the “Make Public”
option. Once it is set to the public option, the home page is visible to the
world.

34. Now select the index.htm file from the console and go to the Properties

Tab.
35. In the last step, you just have to verify the result. Paste the URL link on the
browser and check if it is working or not. Yeah! It is that easy creating a

website in S3. Hurray, we are done.


Congratulations, you have just hosted a website in AWS S3 with a few simple
steps. Once a bucket is created, you can delete it; you can empty a bucket or move
S3 objects from one bucket to another as required.

Object Lifecycle Management in S3


Amazon allows certain actions that can be automatically applied to objects stored
within a particular bucket. These actions are attached as lifecycle sub-resources
to the bucket and consist of the following actions:

 First is Transition action where objects can be moved from one storage
class to another.
 Other is Expiration action where objects are removed automatically when
they expire.

It is clear that transition actions are used to manage storage classes for objects
within a bucket while expiration actions are used to delete an object automatically
as soon as it expires. In this way, it is good if you want to store log files in your
application for a certain time period only. When the defined timeframe will expire,
log files will be removed as per the requirement.

Further, lifecycle management helps in moving objects from one of the real-time
storage classes to the GLACIER class. Here are a few interesting facts you should
know before you complete the data transfer process from one storage class to
another.

 You can move one object to AWS Glacier, but it cannot be moved back to
the standard library. If you want to do it, you should first restore the object
hen copy the object subsequently with the storage class settings.
 It is not possible accessing objects moved from S3 bucket to the Glacier
through Glacier API.
 Amazon stores each 8KB data block of information in S3 when you list all
objects in the real-time.
 It may take up to 5 hours or more when objects are stored from Glacier as
a temporary copy. The object will retain in Glacier until it is not deleted
from that space.

Encryption in Amazon S3
Encryption means encoding an object in such a way that only authorized users can
decode that particular object. Basically, it is possible protecting the data when it is
passed to Amazon servers and while the data is stored to the Amazon. Also, to
protect the data during transmission, you can use SSL (Security Socket Layer) to
transfer HTTP requests. With the help of Amazon’s Java SDK, it is possible to set
up the protocol using “ClientConfiguration.” Here is an example for your reference:

ClientConfiguration clientConf = new ClientConfiguration();

clientConf.setProtocol(Protocol.HTTPS);

AWSCredentialsProvider credentialsProvider = getAwsCredentialsProvid


er();

AwsEnvVarOverrideRegionProvider regionProvider = createRegionProvide


r();

return AmazonS3ClientBuilder.standard()

.withClientConfiguration(clientConf)

.withCredentials(credentialsProvider)

.withRegion(regionProvider.getRegion())

.build();

Data encryption for the stored data can happen in two ways: Client-side Encryption,
and Server-side Encryption. Multiple copies of data are maintained to enable the
regeneration of data in case of data corruption. Also, versioning is performed
wherein each edit is archived for potential retrieval. In the case of server-side
encryption, data is encoded before it is saved to the disc. Your client application
managed the encryption process and sends the data that is already encrypted to
Amazon. The best thing about server-side encryption is that Amazon has already
encrypted the data and performed the key management for you.

Data will be in its original form for the moment while it is stored on Amazon
machines. With the client-side encryption, you can encrypt your own application
and learn to manage keys. This is the way how data is protected in the memory of
your client’s machine and Amazon never sees the original content. Here, you have
to focus on key algorithms and key management yourself.
Amazon offers three different types of Server-Side and client-side
Encryption:

1. Amazon S3-Managed Keys (SSE-S3)


2. AWS KMS-Managed Keys (SSE-KMS)
3. Customer-Provided Keys (SSE-C)
4. Client-side Encryption – KMS
5. Client-side Master Key Encryption
6. Versioning

Amazon S3 Managed Keys (SSE – S3):

Read: How to Become an AWS Certified Solution Architect?


Here, each object is encrypted using a unique key with the help of multi-factor
encryption.

Further, a unique key is also encrypted with a master key that rotates regularly.
This service uses Advanced Encryption standard for encrypting the data. Make
sure that this encryption technique encodes only the object data, not the metadata.

AWS KMS-Managed Keys (SSE-KMS):

This type of encryption offers key management services that are designed to scale
for large distributed apps. Here, the master key is created differently when
compared to the SSE-S3 encryption process. However, both processes allow
creating keys, define policies for them, and audit logs for the usage of each key.
This is the default server-side encryption scheme used by the Amazon. The key
can be used in subsequent calls unless you don’t specify another key.

Customer Provide Keys (SSE-C):

If you don’t like the idea that Amazon provides the key for data encryption, you
have the flexibility to define your own key that is customer-centric key as the part
of your requests. Further, Amazon implements the same key to encrypt or decrypt
the data on the server-side. Amazon stores the HMAC value of the user-created
key to validate future requests. In case, you lost the key created by your then
HMAC value can help you to retrieve the same key and helps in robust key
management.

Client-side Encryption - KMS

The server-side encryption techniques cannot be enough sometimes because


Amazon knows the key value you are using, and there are certain points when raw
data is in hand. To handle this situation, you can use KMS key and encrypt the
data at the client-side. Here, the client first sends the request to KMS service to
retrieve the data for encryption.
The KMS will suggest two random keys and returns the data in two different
versions. The first version is used by the client, and the second version is the
ciphered version of the data key. The ciphered version is sent as the metadata
along with the encrypted data version to Amazon S3. When data is accessed, the
client will load the encrypted and ciphered versions together. It then sends the
ciphered version to the KMS service and receives the plain text as the decoded
data object.

Client-side Master Key Encryption

Here, the client generates a random key to encrypt the data before the final upload.
It then encrypts the data with a master key suggested by the client application and
sends the encrypted data with a master key to Amazon. In this way, Amazon does
have any idea about the raw data and the key used by the client for data encryption.

Versioning

To protect your data, from unwanted deletions or overwrites, you could enable
versioning for a bucket. It will help in creating a new version of the object and
upload the same version instead of overwriting the old data. Versioning is enabled
on a complete bucket instead of any single objects. Once versioning is enabled,
you cannot disable, but you can suspend it.

In this way, encryption is applied in different ways on client-side and server-side


before data is sent to the Amazon S3. You can pick any of them as per your
requirements and convenience.

Final Words:

In this blog for AWS S3 Tutorial, we learned everything about the storage service
from the basics and how to set up a static website using Amazon S3. This guide
discussed the features, benefits, and usage of the service. We also learned
different encryption mechanisms that can be implemented on the raw data as per
the requirement.

At the final leg of AWS S3 Tutorial, we will recommend you joining the AWS
certification program online at JanBask Training and know everything in depth from
the beginning. We wish you luck for a successful career in the AWS space. All the
Best!

You might also like