Overall Satisfaction with Amazon Redshift
Redshift is currently being used to house normalized client data pulled from various third-party endpoints. It houses the data that is both being accessed directly by our business intelligence and CRM platform, as well as made available via our own API gateways. It was chosen for its ability to support a "big data" environment with high availability.
- If you need draw insights from immense amounts (see: petabytes) of transactional (repetitive) data in near real time--think machine learning and business intelligence--and you're already in the AWS ecosystem, then it's your only real option. It performs very well.
- Highly configurable, intelligent compression of repetitive columns reduces your memory footprint, lending to extremely high performance.
- As with most things in the AWS ecosystem, it scales seamlessly and endlessly.
- There is no support for data de-duplication; meaning this has to be either accounted for upstream, or you'll have to build your own services to de-dupe your data.
- It's strength is housing data, not necessarily data insertions. While it has an SQL-like interface, it shouldn't be approached the same as a typical relational database.
- Permissions can be a pain... dovetailing on my previous "con" , in some instances it's easier to drop/rebuild a table than try to navigate incremental updates/insertions, but retaining user-permissions is a pain-point.
- To be honest, we haven't yet seen the return, as we aren't yet operating at scale. So we haven't seen the positive impacts, only the added engineering overhead of dealing with Redshift's eccentricities.
It is well suited for:
- Petabytes of data requiring near real-time analysis
- Massive Data Insertions
- Massive Data Reads
- Web apps
- Smaller transactional inserts
- Smaller reads