What users are saying about

Amazon EMR

24 Ratings

Presto

8 Ratings

Amazon EMR

24 Ratings
<a href='https://www.trustradius.com/static/about-trustradius-scoring' target='_blank' rel='nofollow'>trScore algorithm: Learn more.</a>
Score 8.3 out of 101

Presto

8 Ratings
<a href='https://www.trustradius.com/static/about-trustradius-scoring' target='_blank' rel='nofollow'>trScore algorithm: Learn more.</a>
Score 7.5 out of 101

Add comparison

Likelihood to Recommend

Amazon EMR

If you don't have big data ..i.e petabytes of data with terabytes of data generating every day, then don't use Hadoop. Relational databases are enough for terabytes of data. Hadoop is not well suited for transactional systems or data.
No photo available

Presto

Presto is for interactive simple queries, where Hive is for reliable processing. If you have a fact-dim join, presto is great..however for fact-fact joins presto is not the solution.. Presto is a great replacement for proprietary technology like Vertica
Praveen Murugesan profile photo

Pros

  • EMR does well in managing the cost as it uses the task node cores to process the data and these instances are cheaper when the data is stored on s3. It is really cost efficient. No need to maintain any libraries to connect to AWS resources.
  • EMR is highly available, secure and easy to launch. No much hassle in launching the cluster (Simple and easy).
  • EMR manages the big data frameworks which the developer need not worry (no need to maintain the memory and framework settings) about the framework settings. It's all setup on launch time. The bootstrapping feature is great.
No photo available
  • Fast - Presto, is incredibly fast due to its optimized query engine and is well suited for interactive analysis.
  • Flexible - Presto is highly flexible as it operates with a plug and play model for data sources. Joining and query across different data sources is very easy with presto (eg. HDFS, MySQL, Kafka).
  • ANSI Sql - Presto follows ANSI SQL which is the recognized SQL language and hence helps allow easy query migration without much overhead.
  • Large Fact + Small Dimension table joins made fast - By design presto excels most distributed query engines out there in this type of queries.
Praveen Murugesan profile photo

Cons

  • Providing user friendly tools for hdfs access
  • More simpler apis for easy access and processsing
  • Memory requirenent
No photo available
  • Presto was not designed for large fact fact joins. This is by design as presto does not leverage disk and used memory for processing which in turn makes it fast.. However, this is a tradeoff..in an ideal world, people would like to use one system for all their use cases, and presto should get exhaustive by solving this problem.
  • Resource allocation is not similar to YARN and presto has a priority queue based query resource allocation..so a query that takes long takes longer...this might be alleviated by giving some more control back to the user to define priority/override.
  • UDF Support is not available in presto. You will have to write your own functions..while this is good for performance, it comes at a huge overhead of building exclusively for presto and not being interoperable with other systems like Hive, SparkSQL etc.
Praveen Murugesan profile photo

Alternatives Considered

Having one of these enterprise edition license comes at its own costs. But, the flexibility to have the cluster spin up with the workbenches and code snippets on the same is really beneficial. Especially, if one had to move out of EMR and consider an option which reduces the debugging time in establishing connections to AWS resources, I would love to used the mentioned three resources on EC2. This would definitely make the processing time to reduce as there is a flexibility to test real time and execute the code snippet and look at the performance and monitor the snippet in real time.
No photo available
I think Presto is one of the best solutions out there today at the cutting edge for interactive query analysis. One of the challenges is presto is a niche tool for the interactive query use case and doesn't have the knobs and whistles as much as Spark. In the foreseeable future if they are able to make presto work without the need for Hive, solving all the gaps it could be game changing and can be a direct threat to spark
Praveen Murugesan profile photo

Return on Investment

  • It was obviously cheaper and convenient to use as most of our data processing and pipelines are on AWS. It was fast and readily available with a click and that saved a ton of time rather than having to figure out the down time of the cluster if its on premises.
  • It saved time on processing chunks of big data which had to be processed in short period with minimal costs. EMR solved this as the cluster setup time and processing was simple, easy, cheap and fast.
  • It had a negative impact as it was very difficult in submitting the test jobs as it lags a UI to submit spark code snippets.
No photo available
  • Presto has helped scale Uber's interactive data needs. We have migrated a lot out of proprietary tech like Vertica.
  • Presto has helped build data driven applications on its stack than maintain a separate online/offline stack.
  • Presto has helped us build data exploration tools by leveraging it's power of interactive and is immensely valuable for data scientists.
Praveen Murugesan profile photo

Pricing Details

Amazon EMR

General
Free Trial
Free/Freemium Version
Premium Consulting/Integration Services
Entry-level set up fee?
No
Additional Pricing Details

Presto

General
Free Trial
Free/Freemium Version
Premium Consulting/Integration Services
Entry-level set up fee?
No
Additional Pricing Details