Overall Satisfaction with Amazon Redshift
Amazon Redshift is used as the central data warehouse. It's main use is for data analytics and reporting. In addition, it is also used by batch jobs to perform various business functions like email lists of delinquent customers.
- Fast analytical queries. The shared nothing and column oriented architecture makes querying very quick compared to databases like Oracle that are designed for OLTP. Scaling is a synch since you can scale out by adding more nodes.
- Easy table modelling. The only tough decisions you have to make are what your distribution schemes and sort keys are going to be. This is a lot easier than defining partition and index schemes in databases like Oracle or MySQL.
- Not much maintenance. Almost everything is managed by Amazon. The only exception is table vacuuming and analysis. I was able to program simple ETL jobs to perform this.
- Works with pretty much anything that works with Postgres. It's hard to find a tool that it isn't compatible with.
- Lack of enforced constraints (except NOT NULL column constraints). You have to be very careful in your testing to make sure that you aren't duplicating rows.
- No stored procedure support. Everything must be accomplished through ETL
- Write operations are very slow and complex.Native SQL row level INSERT and UPDATE statements take an extremely long time to execute. In order to get around this for external data that needs to be loaded, you have to bulk load the data from a flat file to a stage table, then upsert the data from the stage table to your destination table. For data already present in the database, ELT is the only viable way of transforming the data.
- No good native data modelling tools.
- Random nondescript errors happen occasionally. The error messages are not decipherable and forums will have no clues as to what happened. It is just a fact of life.
- No trigger support.
- OLTP style queries are painfully slow. Don't even think about using Redshift for OLTP...
- Redshift has had a very positive impact on our business. It has been used to provide analytics on marketing campaigns to boost revenue.
- Redshift is instrumental in our payment collection business processes. It powers everything from who gets called to who gets sent collection emails.
- Compared to Oracle Data Warehouse, Redshift is a better data warehouse. However, this comes at a cost of advanced functionality and the ability to do OLTP style processing. What you gain is faster querying time and better scalability.
- Compared to MySQL, you gain a WHOLE lot. MySQL is terrible for data warehousing and is still gaining features that other databases have had for years (ie. hash joins).
- Compared to Teradata, Redshift is a far cheaper option. This comes at the expense of functionality like partitioning and indexing. For the money though, Redshift is still far better since I personally believe you get much more bang for your buck.
For data warehousing and analytics, Redshift can't be beat. It's price point, minimal maintenance, and OLAP query optimization make it excellent for querying and reporting for an organization with a small budget. As long as you can live without some standard database tools like constraints and stored procedures, it is an excellent database.