Item: Amazon S3 (Simple Storage Service)
Rating: 8
Author: Verified User

Use Cases and Deployment Scope

We use amazon S3 to store all of our data collected from many mobile applications. All of Big Data related workloads consumes run on the data stored in S3.It's cheaper and reliable to store petabyte scale data.

Pros and Cons

AWS S3 is a secure service with variety of encryption services available, it is a best bet. We encrypt most of our data using AWS KMS (Client-Side) encryption where we manage the key and the data is secure on the servers.
We started using S3 to overcome the cost of maintaining the data nodes on Hadoop EMR just for the sole purpose of storing data as it was way expensive than storing the data on S3.
Due to its nature to scale infinitely and also availability we had to choose this platform to make sure our app's have maximum availability.
Not to mention the durability and the easy to manage nature of S3.

Should reduce the cost of IO when transferring the data to glacier archives. Its expensive I believe. When the transfer is between AWS resources the cost should be minimal or negligible.
Scanning for the object size (Get Size) using the new aws S3 console is considerably slower. AWS team should maintain a hourly or daily metadata backup of the bucket or object structure which would fetch the results faster.
I would love to see a minimal outages in the future as it has impacted many businesses recently because of the long lasting outage.

Return on Investment

The data we store in AWS S3 is shared with lot of other AWS account users and the requester pays for the data transfer and we need not worry about the costs to move our data to other accounts.
The data in AWS S3 acts as a centralized storage system where in most of our web, mobile and platform API's and applications access the same data without the need to replicate it and with very low latencies.
Our algorithm models have been evolved and are at a better place now as the training models have been prepared on the data in S3.

Alternatives Considered

HDFS

As most of our work loads and the under laying platforms are build on EMR, Spark and AWS Lambda, we did not find HDFS a suitable solution to have all of our data in. HDFS was very costly as we had to maintain data nodes only for the sole purpose of maintaining the extra storage for replication even when we did not need the extra compute power.

Other Software Used

AWS Lambda, Amazon Relational Database Service, Amazon DynamoDB, Amazon Elastic Compute Cloud (EC2), Amazon Aurora, Amazon Elastic MapReduce

Likelihood to Recommend

It's very well suited if the data being generated by those specific applications is very very large and at a petabyte scale. The reason to choose S3 is very simple and to the point and that is because not only for the fact AWS S3 is a cheaper solution to store such a huge volume of data, but, it's durable, scalable and highly available. Even with the simultaneous loss of 2 data centers, we need not worry that our data is lost. It's a best bet for startups who do not want to invest in the infrastructure.

S3 at a glance. PROS and CONS.

Overall Satisfaction with Amazon S3 (Simple Storage Service)