Item: Amazon Redshift
Rating: 6
Author: Michael Romm

Overall Satisfaction with Amazon Redshift

Use Cases and Deployment Scope

Amazon Redshift is being used by many business units within our company. It is our new data warehousing platform.

Pros and Cons

Pros

Redshift seems to be as fast processing a large dataset as it is with a small one. It seems, when the dataset size is significantly increased (10x, 100x, 1000x, etc.), DML queries are often executed within the same amount of time.
Redshift has a powerful graphical admin tool to monitor the ongoing queries in real time and historically.
Easily expandable capacity. Automatic snapshots that eliminate the need for managing backups. Simple database maintenance strategies with the VACUUM and ANALYZE commands.
Abundance of detailed documentation and tutorials.

Cons

It could benefit from adding data integrity and programming tools common to other database management systems.
Amazon Redshift is based on PostgreSQL 8.0.2. That version of PostgreSQL was released in December 2006. While PostgreSQL was much improved since then, the new features were not implemented in Redshift. Many basic features are missing from it.
Primary keys can be declared but not enforced. Referential integrity (foreign keys) can be declared but not enforced. UNIQUE and CHECK constraints are not supported and cannot be declared.
IDENTITY can be declared on a column, and Redshift will put unique values into it. However: IDENTITY values in the newly inserted rows won’t be incremental or sequential. To implement a sequential number, you need to write your own custom code.
There are no stored procedures in Redshift. We are writing SQL script files, and then parsing and running them one statement at a time from a Python program. This also enabled us to implement execution-time error logging.
In SQL scripts, to check for the row count of affected rows, a complicated join query against some system tables or views has to be executed.
Data Control Language (DCL) does not exist. No statements like IF, WHILE, DO, RAISERROR, etc.
On performance of views… Views do not “pass-through” a query parameter which is a potential problem for performance.
When selecting against a view with the WHERE clause outside of the view, the inner query of the view will be executed first without consideration for the WHERE clause, and only then the WHERE clause will be applied.
Certain clauses of SQL work many times faster than other clauses. So be careful and test your statements for performance earlier rather than later, especially if working with a large data set.
There was a situation when DELETE FROM JOIN was unacceptably slow. Replacing JOIN with the USING clause made DELETE instantaneous.

Return on Investment

Our company is moving to the AWS infrastructure, and in this context moving the warehouse environments to Redshift sounds logical regardless of the cost.
Development organizations have to operate in the Dev/Ops mode where they build and support their apps at the same time.
Hard to estimate the overall ROI of moving to Redshift from my position. However, running Redshift seems to be inexpensive compared to all the licensing and hardware costs we had on our RDBMS platform before Redshift.

Alternatives Considered

It was a company-wide decision to move to AWS, so we did not get to compare Redshift against SQL Server Azure.

Other Software Used

AWS Lambda, Microsoft SQL Server, ER/Studio

Redshift is a viable platform to house a large or very large data warehouse designed for performance and scalability. It is especially well-suited in the cases where your source data is already stored inside of the AWS services infrastructure.

Comments

Please log in to join the conversation

Michael Kidd
commented 4 years ago
This was a 4-months-in review. Did it get better or worse?

After Four Months with Redshift