Likelihood to Recommend
Apache Pig is best suited for ETL-based data processes. It is good in performance in handling and analyzing a large amount of data. it gives faster results than any other similar tool. It is easy to implement and any user with some initial training or some prior SQL knowledge can work on it. Apache Pig is proud to have a large community base globally.
Read full review
Presto is for interactive simple queries, where
is for reliable processing. If you have a fact-dim join, presto is great..however for fact-fact joins presto is not the solution.. Presto is a great replacement for proprietary technology like
Vertica Read full review Pros Its performance, ease of use, and simplicity in learning and deployment. Using this tool, we can quickly analyze large amounts of data. It's adequate for map-reducing large datasets and fully abstracted MapReduce. Read full review Linking, embedding links and adding images is easy enough. Once you have become familiar with the interface, Presto becomes very quick & easy to use (but, you have to practice & repeat to know what you are doing - it is not as intuitive as one would hope). Organizing & design is fairly simple with click & drag parameters. Read full review Cons UDFS Python errors are not interpretable. Developer struggles for a very very long time if he/she gets these errors. Being in early stage, it still has a small community for help in related matters. It needs a lot of improvements yet. Only recently they added datetime module for time series, which is a very basic requirement. Read full review Presto was not designed for large fact fact joins. This is by design as presto does not leverage disk and used memory for processing which in turn makes it fast.. However, this is a tradeoff..in an ideal world, people would like to use one system for all their use cases, and presto should get exhaustive by solving this problem. Resource allocation is not similar to YARN and presto has a priority queue based query resource allocation..so a query that takes long takes longer...this might be alleviated by giving some more control back to the user to define priority/override. UDF Support is not available in presto. You will have to write your own functions..while this is good for performance, it comes at a huge overhead of building exclusively for presto and not being interoperable with other systems like Hive, SparkSQL etc. Read full review Usability
It is quick, fast and easy to implement Apache Pig which makes is quite popular to be used.
Read full review Support Rating
The documentation is adequate. I'm not sure how large of an external community there is for support.
Read full review Alternatives Considered
Apache Pig might help to start things faster at first and it was one of the best tool years back but it lacks important features that are needed in the data engineering world right now. Pig also has a steeper learning curve since it uses a proprietary language compared to Spark which can be coded with Python, Java.
Read full review
Presto is good for a templated design appeal. You cannot be too creative via this interface - but, the layout and options make the finalized visual product appealing to customers. The other design products I use are for different purposes and not really comparable to Presto.
Read full review Return on Investment Higher learning curve than other similar technologies so on-boarding new engineers or change ownership of Apache Pig code tends to be a bit of a headache Once the language is learned and understood it can be relatively straightforward to write simple Pig scripts so development can go relatively quickly with a skilled team As distributed technologies grow and improve, overall Apache Pig feels left in the dust and is more legacy code to support than something to actively develop with. Read full review Presto has helped scale Uber's interactive data needs. We have migrated a lot out of proprietary tech like Vertica. Presto has helped build data driven applications on its stack than maintain a separate online/offline stack. Presto has helped us build data exploration tools by leveraging it's power of interactive and is immensely valuable for data scientists. Read full review ScreenShots