What users are saying about
103 Ratings
90 Ratings
103 Ratings
<a href='https://www.trustradius.com/static/about-trustradius-scoring' target='_blank' rel='nofollow'>trScore algorithm: Learn more.</a>
Score 8.5 out of 101
90 Ratings
<a href='https://www.trustradius.com/static/about-trustradius-scoring' target='_blank' rel='nofollow'>trScore algorithm: Learn more.</a>
Score 8.5 out of 101

Add comparison

Likelihood to Recommend

Apache Spark

The software appears to run more efficiently than other big data tools, such as Hadoop. Given that, Apache Spark is well-suited for querying and trying to make sense of very, very large data sets. The software offers many advanced machine learning and econometrics tools, although these tools are used only partially because very large data sets require too much time when the data sets get too large. The software is not well-suited for projects that are not big data in size. The graphics and analytical output are subpar compared to other tools.
Thomas Young profile photo

Amazon Redshift

Very well suited for aggregating/denormalizing data when you need a reporting environment. Can provide extremely fast querying for analytical purposes. Very nice to not have to have in house responsibility for sensitive data.Not appropriate for a transactional system (though this is not what it is built for obviously). Must keep in mind the data you are syncing up to the cloud and scrub if necessary before. Something to always be mindful of of course.
Brendan McKenna profile photo

Pros

  • Ease of use, the Spark API allows for minimal boilerplate and can be written in a variety of languages including Python, Scala, and Java.
  • Performance, for most applications we have found that jobs are more performant running via Spark than other distributed processing technologies like Map-Reduce, Hive, and Pig.
  • Flexibility, the frameworks comes with support for streaming, batch processing, sql queries, machine learning, etc. It can be used in a variety of applications without needing to integrate a lot of other distributed processing technologies.
No photo available
  • Extremely fast querying allowing for concurrent analysis.
  • PostgreSQL syntax which allows for developers with a SQL background to easily begin working with the data.
  • Multiple output formats including JSON.
  • Safe, easy, and reliable backups.
Brendan McKenna profile photo

Cons

  • Resource heavy, jobs, in general, can be very memory intensive and you will want the nodes in your cluster to reflect that.
  • Debugging, it has gotten better with every release but sometimes it can be difficult to debug an error due to ambiguous or misleading exceptions and stack traces.
No photo available
  • SQL syntax support is not 100% which can lead to frustrating situations when developing a query.
  • No support for database keys.
  • No stored procedure support.
Brendan McKenna profile photo

Usability

No score
No answers yet
No answers on this topic
Amazon Redshift10.0
Based on 1 answer
Just very happy with the product, it fits our needs perfectly. Amazon pioneered the cloud and we have had a positive experience using RedShift. Really cool to be able to see your data housed and to be able to query and perform administrative tasks with ease.
Brendan McKenna profile photo

Alternatives Considered

Apache Pig and Apache Hive provide most of the things spark provide but apache spark has more features like actions and transformations which are easy to code. Spark uses optimization technique as we can select driver program and manipulate DAG (Directed Acyclic Graph)Python can be used even for data transformations but it requires lot of coding compared to Spark and it is even so slow.
Kamesh Emani profile photo
Some organizations use PostgreSQL as an OLAP store. PostgreSQL offers a modern SQL dialect, data types, and features that Redshift lacks. RDS is a great managed PostgreSQL product. However, PostgreSQL is a poor choice for a data warehouse. It's row-oriented storage requires careful schema and index design to ensure analytical queries perform adequately.
Gavin Hackeling profile photo

Return on Investment

  • By learning Spark, we can become certified and/or provide proper recommendations or implementations on Spark solutions.
  • With a background in Hadoop distributed processes, it has been easy to understand and diagnose how Spark handles the transfer of data within a cluster. Especially when using YARN as the resource manager and HDFS as the data source.
  • Staying up to date with the latest changes to Spark has become a repetitive task. While most Hadoop distributions only support Spark 1.6 at the moment, Spark 2.0 has introduced some useful features, but those require a re-write of existing applications.
Jordan Moore profile photo
  • Allowed us to easily analyze business operations and facilitate A/B testing.
  • Allowed us to quickly answer complex questions about the company's data.
  • Took a lot of work to fix some issues once we got to the limits of its usability (specifically, we had too many writes).
No photo available

Pricing Details

Apache Spark

General
Free Trial
Free/Freemium Version
Premium Consulting/Integration Services
Entry-level set up fee?
No
Additional Pricing Details

Amazon Redshift

General
Free Trial
Free/Freemium Version
Premium Consulting/Integration Services
Entry-level set up fee?
No
Additional Pricing Details