S3 at a glance. PROS and CONS.
June 19, 2017
S3 at a glance. PROS and CONS.
Score 8 out of 10
Vetted Review
Verified User
Overall Satisfaction with Amazon S3 (Simple Storage Service)
We use amazon S3 to store all of our data collected from many mobile applications. All of Big Data related workloads consumes run on the data stored in S3.It's cheaper and reliable to store petabyte scale data.
- AWS S3 is a secure service with variety of encryption services available, it is a best bet. We encrypt most of our data using AWS KMS (Client-Side) encryption where we manage the key and the data is secure on the servers.
- We started using S3 to overcome the cost of maintaining the data nodes on Hadoop EMR just for the sole purpose of storing data as it was way expensive than storing the data on S3.
- Due to its nature to scale infinitely and also availability we had to choose this platform to make sure our app's have maximum availability.
- Not to mention the durability and the easy to manage nature of S3.
- Should reduce the cost of IO when transferring the data to glacier archives. Its expensive I believe. When the transfer is between AWS resources the cost should be minimal or negligible.
- Scanning for the object size (Get Size) using the new aws S3 console is considerably slower. AWS team should maintain a hourly or daily metadata backup of the bucket or object structure which would fetch the results faster.
- I would love to see a minimal outages in the future as it has impacted many businesses recently because of the long lasting outage.
- The data we store in AWS S3 is shared with lot of other AWS account users and the requester pays for the data transfer and we need not worry about the costs to move our data to other accounts.
- The data in AWS S3 acts as a centralized storage system where in most of our web, mobile and platform API's and applications access the same data without the need to replicate it and with very low latencies.
- Our algorithm models have been evolved and are at a better place now as the training models have been prepared on the data in S3.
- HDFS
As most of our work loads and the under laying platforms are build on EMR, Spark and AWS Lambda, we did not find HDFS a suitable solution to have all of our data in. HDFS was very costly as we had to maintain data nodes only for the sole purpose of maintaining the extra storage for replication even when we did not need the extra compute power.