Overall Satisfaction with Amazon Redshift
Amazon Redshift is our Data Warehouse, where we store our processed data (Hot data) for various initiatives like BI, Analytics, DataScience, etc
We also use Amazon Redshift Spectrum as our Data Lake, where we store raw (un-processed) data (Cold data) for historical analysis, trends, etc
We store various standard data in Redshift like:
Bronze (ETL-ed data),
Silver (Materialized Views data), and
Gold (Rollups/Aggregated/Dashboard-ready data) in [Amazon] Redshift
We also use Amazon Redshift Spectrum as our Data Lake, where we store raw (un-processed) data (Cold data) for historical analysis, trends, etc
We store various standard data in Redshift like:
Bronze (ETL-ed data),
Silver (Materialized Views data), and
Gold (Rollups/Aggregated/Dashboard-ready data) in [Amazon] Redshift
- [Amazon] Redshift has Distribution Keys. If you correctly define them on your tables, it improves Query performance. For instance, we can define Mapping/Meta-data tables with Distribution-All Key, so that it gets replicated across all the nodes, for fast joins and fast query results.
- [Amazon] Redshift has Sort Keys. If you correctly define them on your tables along with above Distribution Keys, it further improves your Query performance. It also has Composite Sort Keys and Interleaved Sort Keys, to support various use cases
- [Amazon] Redshift is forked out of PostgreSQL DB, and then AWS added "MPP" (Massively Parallel Processing) and "Column Oriented" concepts to it, to make it a powerful data store.
- [Amazon] Redshift has "Analyze" operation that could be performed on tables, which will update the stats of the table in leader node. This is sort of a ledger about which data is stored in which node and which partition with in a node. Up to date stats improves Query performance.
- Amazon Redshift is a Managed Service. But it is Not a 100% managed service. We still need to configure it with WLM (Work Load Management) settings, and add Query Queues to make sure it's resources aren't wasted and it is performant at it's best state, all the time
- [Amazon] Redshift has a concept of "Vacuum", which is an operation to claim the disk space back from deleted data/tables. They recently started doing automated vacuuming. Prior to that we had to do that at regular intervals, to claim the data back.
- MPP (Massively Parallel processing)
- Column Oriented data store
- Good Customer Support
- Greater ROI, as it is 1/10th the cost of traditional data stores and data warehouses.
- it is connected to Tableau and Looker dashboards, and various reporting used by Sales, Marketing, Publishers, Operations, BI, Analytics, DataScience, Finance
- Google BigQuery and Amazon EMR (Elastic MapReduce)
We evaluated [Amazon] Redshift vs BigQuery vs Amazon EMR, back in 2014.
Back then BigQuery cost was slightly higher than that of [Amazon] Redshift price structure.
Amazon EMR, needs lots more management (Admin tasks) and EMR is designed to be ephemeral and not designed to be a data store.
[Amazon] Redshift was ideal with the price structure, performance and ROI[.]
Back then BigQuery cost was slightly higher than that of [Amazon] Redshift price structure.
Amazon EMR, needs lots more management (Admin tasks) and EMR is designed to be ephemeral and not designed to be a data store.
[Amazon] Redshift was ideal with the price structure, performance and ROI[.]
Do you think Amazon Redshift delivers good value for the price?
Yes
Are you happy with Amazon Redshift's feature set?
Yes
Did Amazon Redshift live up to sales and marketing promises?
Yes
Did implementation of Amazon Redshift go as expected?
Yes
Would you buy Amazon Redshift again?
Yes