Why do we Presto
- Fast - Presto, is incredibly fast due to its optimized query engine and is well suited for interactive analysis.
- Flexible - Presto is highly flexible as it operates with a plug and play model for data sources. Joining and query across different data sources is very easy with presto (eg. HDFS, MySQL, Kafka).
- ANSI Sql - Presto follows ANSI SQL which is the recognized SQL language and hence helps allow easy query migration without much overhead.
- Large Fact + Small Dimension table joins made fast - By design presto excels most distributed query engines out there in this type of queries.
Cons
- Presto was not designed for large fact fact joins. This is by design as presto does not leverage disk and used memory for processing which in turn makes it fast.. However, this is a tradeoff..in an ideal world, people would like to use one system for all their use cases, and presto should get exhaustive by solving this problem.
- Resource allocation is not similar to YARN and presto has a priority queue based query resource allocation..so a query that takes long takes longer...this might be alleviated by giving some more control back to the user to define priority/override.
- UDF Support is not available in presto. You will have to write your own functions..while this is good for performance, it comes at a huge overhead of building exclusively for presto and not being interoperable with other systems like Hive, SparkSQL etc.
- Presto has helped scale Uber's interactive data needs. We have migrated a lot out of proprietary tech like Vertica.
- Presto has helped build data driven applications on its stack than maintain a separate online/offline stack.
- Presto has helped us build data exploration tools by leveraging it's power of interactive and is immensely valuable for data scientists.