What users are saying about

Apache Spark<a href='https://www.trustradius.com/static/about-trustradius-scoring' target='_blank' rel='nofollow'>Customer Verified: Read more.</a>

97 Ratings

Presto

8 Ratings

Apache Spark<a href='https://www.trustradius.com/static/about-trustradius-scoring' target='_blank' rel='nofollow'>Customer Verified: Read more.</a>

97 Ratings
<a href='https://www.trustradius.com/static/about-trustradius-scoring' target='_blank' rel='nofollow'>trScore algorithm: Learn more.</a>
Score 8.6 out of 101

Presto

8 Ratings
<a href='https://www.trustradius.com/static/about-trustradius-scoring' target='_blank' rel='nofollow'>trScore algorithm: Learn more.</a>
Score 7.5 out of 101

Add comparison

Likelihood to Recommend

Apache Spark

Apache Spark has rich APIs for regular data transformations or for ML workloads or for graph workloads, whereas other systems may not such a wide range of support. Choose it when you need to perform data transformations for big data as offline jobs, whereas use MongoDB-like distributed database systems for more realtime queries.
Nitin Pasumarthy profile photo

Presto

Presto is for interactive simple queries, where Hive is for reliable processing. If you have a fact-dim join, presto is great..however for fact-fact joins presto is not the solution.. Presto is a great replacement for proprietary technology like Vertica
Praveen Murugesan profile photo

Pros

  • Rich APIs for data transformation making for very each to transform and prepare data in a distributed environment without worrying about memory issues
  • Faster in execution times compare to Hadoop and PIG Latin
  • Easy SQL interface to the same data set for people who are comfortable to explore data in a declarative manner
  • Interoperability between SQL and Scala / Python style of munging data
Nitin Pasumarthy profile photo
  • Fast - Presto, is incredibly fast due to its optimized query engine and is well suited for interactive analysis.
  • Flexible - Presto is highly flexible as it operates with a plug and play model for data sources. Joining and query across different data sources is very easy with presto (eg. HDFS, MySQL, Kafka).
  • ANSI Sql - Presto follows ANSI SQL which is the recognized SQL language and hence helps allow easy query migration without much overhead.
  • Large Fact + Small Dimension table joins made fast - By design presto excels most distributed query engines out there in this type of queries.
Praveen Murugesan profile photo

Cons

  • Data visualization.
  • Waiting for Web Development for small apps to be started with Spark as backbone middleware and HDFS as data retrieval file system.
  • Transformations and actions available are limited so must modify API to work for more features.
Kamesh Emani profile photo
  • Presto was not designed for large fact fact joins. This is by design as presto does not leverage disk and used memory for processing which in turn makes it fast.. However, this is a tradeoff..in an ideal world, people would like to use one system for all their use cases, and presto should get exhaustive by solving this problem.
  • Resource allocation is not similar to YARN and presto has a priority queue based query resource allocation..so a query that takes long takes longer...this might be alleviated by giving some more control back to the user to define priority/override.
  • UDF Support is not available in presto. You will have to write your own functions..while this is good for performance, it comes at a huge overhead of building exclusively for presto and not being interoperable with other systems like Hive, SparkSQL etc.
Praveen Murugesan profile photo

Alternatives Considered

Apache Pig and Apache Hive provide most of the things spark provide but apache spark has more features like actions and transformations which are easy to code. Spark uses optimization technique as we can select driver program and manipulate DAG (Directed Acyclic Graph)Python can be used even for data transformations but it requires lot of coding compared to Spark and it is even so slow.
Kamesh Emani profile photo
I think Presto is one of the best solutions out there today at the cutting edge for interactive query analysis. One of the challenges is presto is a niche tool for the interactive query use case and doesn't have the knobs and whistles as much as Spark. In the foreseeable future if they are able to make presto work without the need for Hive, solving all the gaps it could be game changing and can be a direct threat to spark
Praveen Murugesan profile photo

Return on Investment

  • Switching from PIG Latin to Apache Spark sped up the overall development time and also the resource utilization has gone up.
  • Our offline jobs also run faster than traditional map-reduce like systems.
  • Integrating with Jupyter like notebook environments, the development experience becomes more pleasant and we can iterate much faster.
Nitin Pasumarthy profile photo
  • Presto has helped scale Uber's interactive data needs. We have migrated a lot out of proprietary tech like Vertica.
  • Presto has helped build data driven applications on its stack than maintain a separate online/offline stack.
  • Presto has helped us build data exploration tools by leveraging it's power of interactive and is immensely valuable for data scientists.
Praveen Murugesan profile photo

Pricing Details

Apache Spark

General
Free Trial
Free/Freemium Version
Premium Consulting/Integration Services
Entry-level set up fee?
No
Additional Pricing Details

Presto

General
Free Trial
Free/Freemium Version
Premium Consulting/Integration Services
Entry-level set up fee?
No
Additional Pricing Details