Amazon S3 Glacier (S3 Glacier) is a storage service that keeps backups, archives, and other low-cost data safe and accessible for a long time. It has unique features that make it better at handling large files than other storage services. It can also help you store your data more cost-effectively if used properly. S3 Glacier does all the work for you, removing the hassle and costs associated with operating and scaling storage on Amazon Web Services. Companies no longer need to worry about planning for how much hardware they'll need or make sure that the hardware is set up properly. S3 detects when hardware fails and recovers from them automatically. This enables IT Teams to focus on making better applications and expanding their business globally.
Amazon Simple Storage Service (Amazon S3) offers three different Amazon S3 Glacier archive storage classes that are suited for different needs. There are different types of S3 Glacier storage that are designed for different purposes. For example, some storage is meant for people who want to keep things for a short period of time, while others are meant for people who want to keep things for a long time.
These S3 Glacier storage classes differ as follows:
-
S3 Glacier Instant Retrieval – For archiving rarely accessed data i.e. accessed once per quarter and requires millisecond retrieval.
-
S3 Glacier Flexible Retrieval (formerly the S3 Glacier storage class) – For retrieving portions of the data that need retrieval in minutes. Data stored in the S3 Glacier Flexible Retrieval storage class can be accessed in milliseconds to hour time -
-
For Expedited retrievals - It takes as little as 1-5 minutes.
-
For Bulk retrievals - It takes up to 5-12 hours.
-
S3 Glacier Deep Archive – For archiving data that is rarely accessed. Such data has a default retrieval time of 12 hours.
Use cases of Amazon S3 Glacier
Medical images and patient reports are generally not accessed after a year of storage, but they might be accessed in the future. That's why medical image data archiving is a mandatory process in the hospitality industry. This makes it easier for patients to get appointments with their doctors, and doctors can spend more time with their patients without having to search for patients' data before appointments.
Digital media companies need to store lots of data, and as the data grows, so does the demand for storage. Also, A lot of businesses in the Finance and Healthcare sectors have to keep data archives for regulatory, compliance, and business policy purposes. These archives need to be reliable and secure, which can be expensive.
Research institutions should focus on their research and studies, without needing to worry about how to store or access large amounts of data. Storing large amounts of data cheaply and retrieving them quickly provided that the storage service is reliable and secure can free up institutions' workload.
Data integration is a challenge for businesses of all sizes. Businesses must practice regular data integrity checks in a systematic way, to make sure data is secure and protected. This storage service is designed for businesses that want to save money and can make an easier transition to a new storage class.
Companies using magnetic tape storage are slow, difficult to use, and prone to damage. For such type of storage often companies have to make large, upfront investments, and must constantly keep it in good condition. For businesses, looking for long-term backup plans with low-cost archive storage and fast retrieval time that helps in cost-cutting for on-premises storage can be crucial.
AWS S3 Glacier is the best option for businesses because it fulfills all of these above requirements and is very affordable and helps companies store older media files for long-term preservation and instant accessibility feature. This service is beneficial for sectors in Finance, Healthcare, Life Sciences, Research and development institutions, data companies, and many more.
Challenge: Retrieve all the files asynchronously.
When you’re trying to retrieve an archive from Amazon S3 Glacier then you’re going to face some challenges. Archival retrieving using S3 Glacier is a two-step process that asynchronously does the operation/job. The first step is to initiate a job request, and then once the job is completed you will be able to download the archive. To initiate the job for archival retrieving, you’re going to use the Initiate Job i.e. POST jobs REST API operation or some similar APIs in the Amazon CLI or the Amazon SDKs.
In the solution below, we’ve listed down detailed steps to follow for archival retrieving from S3 Glacier using Lambda Function.
Solution: How to Restore S3 Glacier using Lambda Function?
STEP 0 : Install dependencies
Install boto 3 (if not installed) by the following commands in the terminal.
pip install boto3
After installation check whether the boto3 is installed successfully or not.
boto3 --version
STEP 1 : Choose the S3 Bucket Name and Folder to be restored from Glacier.
Select the bucket name and folder you would like to restore from Glacier Storage.
(For example,
Bucket name - ‘ gagenservice-mandanten ‘ and
Folder name - ‘ TVS/Stammblätter/2019 ‘.
STEP 2 :
Go to Python program S3restore.py and change the below highlighted (yellow) fields with the bucket name and prefix name (that you want to restore)
import json
import boto3
import time
from datetime import datetime
import uuid
s3_resource = boto3.client('s3')
#Mention the bucket name where you have glacier storage
S3_BUCKET = 'bucketname-mandanten'
#Mention the folder path where you have glacier storage
S3_PREFIX = 'TVS/Stammblätter/2018'
s3_resource = boto3.resource('s3')
bucket = s3_resource.Bucket(S3_BUCKET)
print("\nYour bucket contains the following objects:")
try:
for o in bucket.objects.filter(Prefix=S3_PREFIX):
print({o.key})
obj = s3_resource.Object(o.bucket_name, o.key)
print(obj.storage_class)
# Try to restore the object if the storage class is glacier and
# the object does not have a completed or ongoing restoration
# request.
if obj.storage_class == 'GLACIER':
if obj.restore is None:
print('Submitting restoration request: %s' % obj.key)
obj.restore_object(RestoreRequest={'Days': 10})
# Print out objects whose restoration is on-going
elif 'ongoing-request ="true"' in obj.restore:
print('Restoration in-progress: %s' % obj.key)
# Print out objects whose restoration is complete
elif 'ongoing-request ="false"' in obj.restore:
print('Restoration complete: %s' % obj.key)
else:
print("no record found")
except ClientError as err:
print(f"Couldn't list the objects in bucket {bucket.name}.")
STEP 3 :
Then Save the file and run it as python S3restore.py from the command line or Mac terminal
Once the job is submitted, we need to wait for 6-7 hrs before the restore job is completed.
STEP 4 :
Copy the restored files into another S3 bucket
Create a new bucket called testbucket-restore (if not created)
STEP 5 :
Change the python program s3copyafter-restore.py with the below details:
Change the highlighted (yellow) fields with previously restored s3 and folders and with the target bucket name
======================
s3copyafter-restore.py
# This script copies all objects from one bucket to another
import json
import boto3
import time
from datetime import datetime
import uuid
#s3_resource = boto3.client('s3')
#Source file and prefix
S3_BUCKET = 'bucket-mandanten'
S3_PREFIX = 'TVS/Stammblätter/2018'
s3_resource = boto3.resource('s3')
bucket = s3_resource.Bucket(S3_BUCKET)
# destination bucket
dest_bucket = s3_resource.Bucket('testbucket-restore')
print(bucket)
print(dest_bucket)
try:
for obj in bucket.objects.filter(Prefix=S3_PREFIX):
dest_key = obj.key
print(dest_key)
s3_resource.Object(dest_bucket.name, dest_key).copy_from(CopySource = {'Bucket': obj.bucket_name, 'Key': obj.key})
except ClientError as err:
print(f"Couldn't find the objects in bucket {bucket.name}.")
STEP 6 :
Wait for the program to complete and then follow the below steps in the terminal:
>> cd Archive
>> aws s3 sync s3://Bucketname/TVS/Stammblätter/2017/ .
STEP 7 :
Delete the Archival files/folders from the S3 bucket as needed.
Advantages of using Amazon S3 Glacier
-
Low Cost & Pricing: S3 Glacier storage classes are built for cost-effective storage for specific access patterns, enabling you to archive large amounts of data at a very minimal cost.
The storage price for S3 Glacier is minimal i.e. $0.004 per 1GB archive storage.
With no upfront costs or minimum commitments, you simply pay for what you use.
-
Long-term Archival Storage: S3 Glacier is highly flexible to archive any type and amount of “cold” data, the files that are not needed all the time but might be needed in the future. “Cold” data is how S3 Glacier got its name.
-
Faster Retrieval Time: It offers quick retrieval options (bulk, standard, expedited) from milliseconds to hours to meet your organization and business requirements. The retrieval speed in S3 Glacier is minutes for expedited, 4-5 hours for standard, and 5-12 hours for bulk.
-
Speed: The speed for the first-byte latency is select minutes or hours i.e. the moment you click on the retrieval option it starts its first byte of processing.
-
Security & Compliance: S3 Glacier storage classes offer increased security standards and compliance certifications that include SEC Rule 17a-4, PCI-DSS, HIPAA/HITECH, FedRAMP, EU GDPR, and FISMA. With WORM storage capabilities enabled by Amazon S3 Object Lock, compliance requirements for almost all regulatory agencies worldwide can be met.
-
Durability: S3 Glacier is highly durable with unmatched capability and is built for 99.999999999% (11 nines) of durability. It stores data redundantly on multiple devices across several AZs and Regions that are geographically apart so that it can sustain device failures by quick detection and recovery.
-
Scalability: Glacier storage classes are highly scalable. The data stored in the S3 Glacier storage classes can be stored up or down depending on the business and user requirements.
-
Availability: S3 Glacier is designed for 99.9% availability of objects over a given year for data that are rarely accessed and need instant access. This will be beneficial for companies that are involved in image-hosting, file-sharing, hospital systems, digital media archives, and user-generated content (UGC).
-
More partners and vendors with solutions: Amazon S3 object storage services includes thousands of consulting, systems integrator, and independent software vendor partners in addition to integration with most of AWS services, and more are joining every month. No other cloud service provider other than Amazon has more partners with solutions that are pre-integrated to work with their service.
-
Consistency: S3 Glacier is consistent throughout the S3 lifecycle as it optimizes cost by choosing the least expensive path. It is used for data transition to any of the S3 storage classes by lowering costs as data becomes infrequently or rarely accessed.
Conclusion
S3 Glacier is a storage service that is designed to store data that is not used often. This service comes in three different storage classes, each with its own set of features and benefits. You can choose the storage class that is best suited to your needs. The AWS S3 Glacier Storage classes can be used by organizations to store large amounts of data at different rates, which makes it easier to move data from one storage class to another on a petabyte scale.
In this post, we covered the powerful Amazon S3 Glacier storage class services, what is Amazon S3 Glacier, the advantages of using S3 Glacier, the use cases of Amazon S3 Glacier, the challenges that are faced during retrieving all the files asynchronously and the solution for the challenge.