Likelihood to Recommend Well suited: To most of the local run of datasets and non-prod systems - scalability is not a problem at all. Including data from multiple types of data sources is an added advantage. MLlib is a decently nice built-in library that can be used for most of the ML tasks. Less appropriate: We had to work on a RecSys where the music dataset that we used was around 300+Gb in size. We faced memory-based issues. Few times we also got memory errors. Also the MLlib library does not have support for advanced analytics and deep-learning frameworks support. Understanding the internals of the working of Apache Spark for beginners is highly not possible.
Read full review We just need to refresh our data once a day for our unique use case, which allows the complete online system to run on extracts. For us, this is critical because our daylight hours are spent focusing on new updates and implementations rather than worrying about excessive database traffic (which would be required with a direct connection to the online system). The process of importing extracts is straightforward and sturdy enough to handle massive amounts of data.
Read full review Pros Apache Spark makes processing very large data sets possible. It handles these data sets in a fairly quick manner. Apache Spark does a fairly good job implementing machine learning models for larger data sets. Apache Spark seems to be a rapidly advancing software, with the new features making the software ever more straight-forward to use. Read full review Tableau Online is completely cloud based and that's why the reports and dashboards are accessible even on the go. One doesn't always need to access the office laptop to access the reports. The visualizations are interactive and one can quickly change the level at which they want to view the information. For example, one person might be more interested in looking at the country level performances rather than client level. This is intuitive and one doesn't need to create multiple reports for the same. The feature to ask questions in plain vanilla English language is great and helpful. For quick adhoc fact checks one can simply type what they are looking for and the Natural Language Programming algorithms under the hood parse the query, interpret it and then fetch the results accordingly in a visual form. Read full review Cons Memory management. Very weak on that. PySpark not as robust as scala with spark. spark master HA is needed. Not as HA as it should be. Locality should not be a necessity, but does help improvement. But would prefer no locality Read full review Can be a steep learning curve for new users Modeling and building algorithms aren't always intuitive and take some testing/retesting to ensure it's working as it should Inability to integrate easily with our HRIS platform. Reports are pulled from HRIS at various intervals and uploaded into Tableau Read full review Likelihood to Renew Capacity of computing data in cluster and fast speed.
Steven Li Senior Software Developer (Consultant)
Read full review Usability The only thing I dislike about spark's usability is the learning curve, there are many actions and transformations, however, its wide-range of uses for ETL processing, facility to integrate and it's multi-language support make this library a powerhouse for your data science solutions. It has especially aided us with its lightning-fast processing times.
Read full review From an end user perspective Tableau Online is overall very easy to navigate once you get used to it, my only complaint is that when expanding or contracting a graph, the "plus" and "minus" on the bottom left is sometimes hidden, and should always be visible. From a builder perspective, it can take some getting used to but the sheer depth of customization makes it all worthwhile.
Read full review Support Rating 1. It integrates very well with scala or python. 2. It's very easy to understand SQL interoperability. 3. Apache is way faster than the other competitive technologies. 4. The support from the Apache community is very huge for Spark. 5. Execution times are faster as compared to others. 6. There are a large number of forums available for Apache Spark. 7. The code availability for Apache Spark is simpler and easy to gain access to. 8. Many organizations use Apache Spark, so many solutions are available for existing applications.
Read full review In times where the system is down, support has always been quick to notify and keep us apprised of the latest developments. It's crucial for our system to always be available, but when emergencies have arisen, I don't recall a time where the Tableau Online Support hasn't been able to address our concerns in a timely manner.
Read full review Alternatives Considered All the above systems work quite well on big data transformations whereas Spark really shines with its bigger API support and its ability to read from and write to multiple data sources. Using Spark one can easily switch between declarative versus imperative versus functional type programming easily based on the situation. Also it doesn't need special data ingestion or indexing pre-processing like
Presto . Combining it with Jupyter Notebooks (
https://github.com/jupyter-incubator/sparkmagic ), one can develop the Spark code in an interactive manner in Scala or Python
Read full review Googles dashboard suite is very user-friendly and anyone can edit and make changes with very little knowledge or practice. But nothing I’ve worked with compares to the customization and multi streams of data in a user-friendly package like tableau does. It’s a really cool piece of software and I would choose that again.
Read full review Return on Investment Faster turn around on feature development, we have seen a noticeable improvement in our agile development since using Spark. Easy adoption, having multiple departments use the same underlying technology even if the use cases are very different allows for more commonality amongst applications which definitely makes the operations team happy. Performance, we have been able to make some applications run over 20x faster since switching to Spark. This has saved us time, headaches, and operating costs. Read full review When we release new products, we are now able to quickly see data and toggle between current periods and previous to see performance Generating new reports requires less IT time to build Data can be shared across many different device types We now have integration where our customers can extract data from our software more easily-this was a big ask from our customers Read full review ScreenShots