After Four Months with Redshift
September 13, 2017
After Four Months with Redshift
Score 6 out of 10
Overall Satisfaction with Amazon Redshift
Amazon Redshift is being used by many business units within our company. It is our new data warehousing platform.
- Redshift seems to be as fast processing a large dataset as it is with a small one. It seems, when the dataset size is significantly increased (10x, 100x, 1000x, etc.), DML queries are often executed within the same amount of time.
- Redshift has a powerful graphical admin tool to monitor the ongoing queries in real time and historically.
- Easily expandable capacity. Automatic snapshots that eliminate the need for managing backups. Simple database maintenance strategies with the VACUUM and ANALYZE commands.
- Abundance of detailed documentation and tutorials.
- It could benefit from adding data integrity and programming tools common to other database management systems.
- Amazon Redshift is based on PostgreSQL 8.0.2. That version of PostgreSQL was released in December 2006. While PostgreSQL was much improved since then, the new features were not implemented in Redshift. Many basic features are missing from it.
- Primary keys can be declared but not enforced. Referential integrity (foreign keys) can be declared but not enforced. UNIQUE and CHECK constraints are not supported and cannot be declared.
- IDENTITY can be declared on a column, and Redshift will put unique values into it. However: IDENTITY values in the newly inserted rows won’t be incremental or sequential. To implement a sequential number, you need to write your own custom code.
- There are no stored procedures in Redshift. We are writing SQL script files, and then parsing and running them one statement at a time from a Python program. This also enabled us to implement execution-time error logging.
- In SQL scripts, to check for the row count of affected rows, a complicated join query against some system tables or views has to be executed.
- Data Control Language (DCL) does not exist. No statements like IF, WHILE, DO, RAISERROR, etc.
- On performance of views… Views do not “pass-through” a query parameter which is a potential problem for performance.
- When selecting against a view with the WHERE clause outside of the view, the inner query of the view will be executed first without consideration for the WHERE clause, and only then the WHERE clause will be applied.
- Certain clauses of SQL work many times faster than other clauses. So be careful and test your statements for performance earlier rather than later, especially if working with a large data set.
- There was a situation when DELETE FROM JOIN was unacceptably slow. Replacing JOIN with the USING clause made DELETE instantaneous.
- Our company is moving to the AWS infrastructure, and in this context moving the warehouse environments to Redshift sounds logical regardless of the cost.
- Development organizations have to operate in the Dev/Ops mode where they build and support their apps at the same time.
- Hard to estimate the overall ROI of moving to Redshift from my position. However, running Redshift seems to be inexpensive compared to all the licensing and hardware costs we had on our RDBMS platform before Redshift.
It was a company-wide decision to move to AWS, so we did not get to compare Redshift against SQL Server Azure.