Amazon S3 (Simple Storage Service) great use for a data lake implementation
Use Cases and Deployment Scope
Pros
- Bucket name uniqueness, as it forces to implement some rudimentary form of naming organization
- Flexibility in the buckets management: policies, version control, etc
- Available APIs: it is possible to interact with Amazon S3 (Simple Storage Service) quite easily thanks to the various APIs to read/write/update the objects
Cons
- UI: it could be a bit more intuitive, especially when there are deleted elements
- Filter on the prefix (partial) name: in a lot of cases, the precise full path and name of the object must be know to find it
- It’s very easy to have too broad policies or completely lock yourself out from a bucket, it would be nice to have some guardrails in place
Return on Investment
- Affordable: the entire data lake and most of our raw data is on Amazon S3 (Simple Storage Service) and it’s not the most expensive feature from AWS we use
- Easy to onboard to: we are aiming for 100% of data being synced to Amazon S3 (Simple Storage Service) in some form, so that data is located in a single place
- Good integration with other systems, reduced overall costs for us and time to reach a decision


