What users are saying about
103 Ratings
15 Ratings
103 Ratings
<a href='https://www.trustradius.com/static/about-trustradius-scoring' target='_blank' rel='nofollow'>trScore algorithm: Learn more.</a>
Score 8.5 out of 101
15 Ratings
<a href='https://www.trustradius.com/static/about-trustradius-scoring' target='_blank' rel='nofollow'>trScore algorithm: Learn more.</a>
Score 8 out of 101

Add comparison

Likelihood to Recommend

Apache Spark

The software appears to run more efficiently than other big data tools, such as Hadoop. Given that, Apache Spark is well-suited for querying and trying to make sense of very, very large data sets. The software offers many advanced machine learning and econometrics tools, although these tools are used only partially because very large data sets require too much time when the data sets get too large. The software is not well-suited for projects that are not big data in size. The graphics and analytical output are subpar compared to other tools.
Thomas Young profile photo

MapR

If you need Hadoop and just need raw speed for I/O and have a Hadoop savvy group of engineers who don't need/like web UIs, then MapR is a great fit for you. If you are new to Hadoop or have DevOps folks that are not Hadoop gurus, choosing MapR as your Hadoop vendor will have a steeper learning curve as you will need to do more training and build more admin consoles for them.
No photo available

Pros

  • Apache Spark makes processing very large data sets possible. It handles these data sets in a fairly quick manner.
  • Apache Spark does a fairly good job implementing machine learning models for larger data sets.
  • Apache Spark seems to be a rapidly advancing software, with the new features making the software ever more straight-forward to use.
Thomas Young profile photo
  • MapR had very fast I/O throughput. The write speed was several times faster than what we could achieve with the other Hadoop vendors (Cloudera and Hortonworks). This is because MapR does not use HDFS, which is essentially a "meta filesystem". HDFS is built on top of the filesystem provided by the OS. MapR has their filesystem called MapR-FS, which is a true filesystem and accesses the raw disk drives.
  • The MapR filesystem is very easy to integrate with other Linux filesystems. When working with HDFS from Apache Hadoop, you usually have to use either the HDFS API or various Hadoop/HDFS command line utilities to interact with HDFS. You cannot use command line utilities native to the host operation system, which is usually Linux. At least, it is not easily done without setting up NFS, gateways, etc. With MapR-FS, you can mount the filesystem within Linux and use the standard Unix commands to manipulate files.
  • The HBase distribution provided by MapR is very similar to the Apache HBase distribution. Cloudera and Hortonworks add GUIs and other various tools on top of their HBase distributions. The MapR HBase distribution is very similar to the Apache distribution, which is nice if you are more accustomed to using Apache HBase.
No photo available

Cons

  • Apache Spark requires some advanced ability to understand and structure the modeling of big data. The software is not user-friendly.
  • The graphics produced by Apache Spark are by no means world-class. They sometimes appear high-schoolish.
  • Apache Spark takes an enormous amount of time to crunch through multiple nodes across very large data sets. Apache Spark could improve this by offering the software in a more interactive programming environment.
Thomas Young profile photo
  • I think MapR's main problem is name recognition. Hortonworks and Cloudera both are big names in the industry, but their deployment mechanisms are a little more difficult to use, especially when trying to fully automate it's deployment.
  • Documentation could always be better. But really, if that's your main weakness, it's everybody's weakness.
No photo available

Alternatives Considered

All the above systems work quite well on big data transformations whereas Spark really shines with its bigger API support and its ability to read from and write to multiple data sources. Using Spark one can easily switch between declarative versus imperative versus functional type programming easily based on the situation. Also it doesn't need special data ingestion or indexing pre-processing like Presto. Combining it with Jupyter Notebooks (https://github.com/jupyter-incubator/sparkmagic), one can develop the Spark code in an interactive manner in Scala or Python
Nitin Pasumarthy profile photo
When we were shopping, Mapr had the momentum, high availability even on Hadoop 1.x, an improved file system and better a central control system. Now it looks like the situation has changed a lot.
No photo available

Return on Investment

  • Workflow process using spark went from 1 day to 2 hours
  • Spark Streaming allowed for quick determiniation of data validity
  • spark on yarn was good for manangement. But Spark with Kubernetes was easier to use.
Anson Abraham profile photo
  • Less manual intervention for maintaining a cluster.
No photo available

Pricing Details

Apache Spark

General
Free Trial
Free/Freemium Version
Premium Consulting/Integration Services
Entry-level set up fee?
No
Additional Pricing Details

MapR

General
Free Trial
Free/Freemium Version
Premium Consulting/Integration Services
Entry-level set up fee?
No
Additional Pricing Details