What users are saying about
113 Ratings
<a href='https://www.trustradius.com/static/about-trustradius-scoring#question3' target='_blank' rel='nofollow noopener noreferrer'>Customer Verified: Read more.</a>
Top Rated
132 Ratings
113 Ratings
<a href='https://www.trustradius.com/static/about-trustradius-scoring' target='_blank' rel='nofollow noopener noreferrer'>trScore algorithm: Learn more.</a>
Score 8.4 out of 101

Amazon Redshift

<a href='https://www.trustradius.com/static/about-trustradius-scoring#question3' target='_blank' rel='nofollow noopener noreferrer'>Customer Verified: Read more.</a>
Top Rated
132 Ratings
<a href='https://www.trustradius.com/static/about-trustradius-scoring' target='_blank' rel='nofollow noopener noreferrer'>trScore algorithm: Learn more.</a>
Score 8.2 out of 101

Likelihood to Recommend

Apache Spark

The software appears to run more efficiently than other big data tools, such as Hadoop. Given that, Apache Spark is well-suited for querying and trying to make sense of very, very large data sets. The software offers many advanced machine learning and econometrics tools, although these tools are used only partially because very large data sets require too much time when the data sets get too large. The software is not well-suited for projects that are not big data in size. The graphics and analytical output are subpar compared to other tools.
Thomas Young profile photo

Amazon Redshift

If the number of connections is expected to be low, but the amounts of data are large or projected to grow it is a good solutions especially if there is previous exposure to PostgreSQL. Speaking of Postgres, Redshift is based on several versions old releases of PostgreSQL so the developers would not be able to take advantage of some of the newer SQL language features. The queries need some fine-tuning still, indexing is not provided, but playing with sorting keys becomes necessary. Lastly, there is no notion of the Primary Key in Redshift so the business must be prepared to explain why duplication occurred (must be vigilant for)
Arthur Zubarev profile photo

Pros

Apache Spark

  • Rich APIs for data transformation making for very each to transform and prepare data in a distributed environment without worrying about memory issues
  • Faster in execution times compare to Hadoop and PIG Latin
  • Easy SQL interface to the same data set for people who are comfortable to explore data in a declarative manner
  • Interoperability between SQL and Scala / Python style of munging data
Nitin Pasumarthy profile photo

Amazon Redshift

  • Redshift is fully managed. Small teams do not have the resources to maintain a cluster. CloudWatch metrics are provided out-of-the-box, and it is easy to configure alarms.
  • Redshift's console allows you to easily inspect and manage queries, and manage the performance of the cluster.
  • Redshift is ubiquitous; many products (e.g., ETL services) integrate with it out-of-the-box.
  • Writing .csvs to S3 and querying them through Redshift Spectrum is convenient.
Gavin Hackeling profile photo

Cons

Apache Spark

  • Memory management. Very weak on that.
  • PySpark not as robust as scala with spark.
  • spark master HA is needed. Not as HA as it should be.
  • Locality should not be a necessity, but does help improvement. But would prefer no locality
Anson Abraham profile photo

Amazon Redshift

  • It could benefit from adding data integrity and programming tools common to other database management systems.
  • Amazon Redshift is based on PostgreSQL 8.0.2. That version of PostgreSQL was released in December 2006. While PostgreSQL was much improved since then, the new features were not implemented in Redshift. Many basic features are missing from it.
  • Primary keys can be declared but not enforced. Referential integrity (foreign keys) can be declared but not enforced. UNIQUE and CHECK constraints are not supported and cannot be declared.
  • IDENTITY can be declared on a column, and Redshift will put unique values into it. However: IDENTITY values in the newly inserted rows won’t be incremental or sequential. To implement a sequential number, you need to write your own custom code.
  • There are no stored procedures in Redshift. We are writing SQL script files, and then parsing and running them one statement at a time from a Python program. This also enabled us to implement execution-time error logging.
  • In SQL scripts, to check for the row count of affected rows, a complicated join query against some system tables or views has to be executed.
  • Data Control Language (DCL) does not exist. No statements like IF, WHILE, DO, RAISERROR, etc.
  • On performance of views… Views do not “pass-through” a query parameter which is a potential problem for performance.
  • When selecting against a view with the WHERE clause outside of the view, the inner query of the view will be executed first without consideration for the WHERE clause, and only then the WHERE clause will be applied.
  • Certain clauses of SQL work many times faster than other clauses. So be careful and test your statements for performance earlier rather than later, especially if working with a large data set.
  • There was a situation when DELETE FROM JOIN was unacceptably slow. Replacing JOIN with the USING clause made DELETE instantaneous.
Michael Romm profile photo

Usability

Apache Spark

No score
No answers yet
No answers on this topic

Amazon Redshift

Amazon Redshift 8.6
Based on 8 answers
Just very happy with the product, it fits our needs perfectly. Amazon pioneered the cloud and we have had a positive experience using RedShift. Really cool to be able to see your data housed and to be able to query and perform administrative tasks with ease.
Brendan McKenna profile photo

Support

Apache Spark

No score
No answers yet
No answers on this topic

Amazon Redshift

Amazon Redshift 5.7
Based on 3 answers
Redshift support is seamless. The system is MPP (distributed), so it is highly available, always backed up by AWS and you can also have read-only replicas (at a cost) which help overcome the number of connections issue.
Although, AWS looks like is not going to upgrade its storage engine to the newer version of Postgres which is a big pity.
Arthur Zubarev profile photo

Alternatives Considered

Apache Spark

Spark in comparison to similar technologies ends up being a one stop shop. You can achieve so much with this one framework instead of having to stitch and weave multiple technologies from the Hadoop stack, all while getting incredibility performance, minimal boilerplate, and getting the ability to write your application in the language of your choosing.
No photo available

Amazon Redshift

Than Vertica: Redshift is cheaper and AWS integrated (which was a plus because the whole company was on AWS).
Than BigQuery: Redshift has a standard SQL interface, though recently I heard good things about BigQuery and would try it out again.
Than Hive: Hive is great if you are in the PB+ range, but latencies tend to be much slower than Redshift and it is not suited for ad-hoc applications.
No photo available

Return on Investment

Apache Spark

  • It has had a very positive impact, as it helps reduce the data processing time and thus helps us achieve our goals much faster.
  • Being easy to use, it allows us to adapt to the tool much faster than with others, which in turn allows us to access various data sources such as Hadoop, Apache Mesos, Kubernetes, independently or in the cloud. This makes it very useful.
  • It was very easy for me to use Apache Spark and learn it since I come from a background of Java and SQL, and it shares those basic principles and uses a very similar logic.
Carla Borges profile photo

Amazon Redshift

  • Redshift has had a very positive impact on our business. It has been used to provide analytics on marketing campaigns to boost revenue.
  • Redshift is instrumental in our payment collection business processes. It powers everything from who gets called to who gets sent collection emails.
Seth Goldberg profile photo

Pricing Details

Apache Spark

General

Free Trial
Free/Freemium Version
Premium Consulting/Integration Services
Entry-level set up fee?
No

Amazon Redshift

General

Free Trial
Free/Freemium Version
Premium Consulting/Integration Services
Entry-level set up fee?
No

Add comparison