Comparison of MapR versus other Hadoop distributions
December 01, 2015

Comparison of MapR versus other Hadoop distributions

Anonymous | TrustRadius Reviewer
Score 7 out of 10
Vetted Review
Verified User

Overall Satisfaction with MapR

We were implementing a fully relational database on top of HBase. This was not a "SQL on Hadoop" project where we supported a subset of SQL. We developed a fully SQL-92 (partially SQL-99) compliant database on top of HBase. We supported all three major commercial distributions of HBase (Cloudera, Hortonworks, and MapR).
  • MapR had very fast I/O throughput. The write speed was several times faster than what we could achieve with the other Hadoop vendors (Cloudera and Hortonworks). This is because MapR does not use HDFS, which is essentially a "meta filesystem". HDFS is built on top of the filesystem provided by the OS. MapR has their filesystem called MapR-FS, which is a true filesystem and accesses the raw disk drives.
  • The MapR filesystem is very easy to integrate with other Linux filesystems. When working with HDFS from Apache Hadoop, you usually have to use either the HDFS API or various Hadoop/HDFS command line utilities to interact with HDFS. You cannot use command line utilities native to the host operation system, which is usually Linux. At least, it is not easily done without setting up NFS, gateways, etc. With MapR-FS, you can mount the filesystem within Linux and use the standard Unix commands to manipulate files.
  • The HBase distribution provided by MapR is very similar to the Apache HBase distribution. Cloudera and Hortonworks add GUIs and other various tools on top of their HBase distributions. The MapR HBase distribution is very similar to the Apache distribution, which is nice if you are more accustomed to using Apache HBase.
  • The MapR web UI console is pretty basic. When you compare it to Cloudera Manager and Apache Ambari (ships with Hortonworks), it is definitely in third place. MapR has definitely invested heavily in file system performance with their MapR-FS, but they should invest a bit more in making it easier to administer and manage a MapR cluster.
  • MapR should tune their MapR-FS to work better with HBase. Once again, MapR-FS has invested heavily in their own proprietary technology such as the MapR-DB in this case. MapR-DB is a "wire compatible" version of HBase, but it is a bit of a different beast from HBase. What this means is that we ran into performance issues when running vanilla HBase on MapR-FS. Basically, the write throughput was so amazingly fast for the MapR-FS that it caused compaction storms with HBase. Slowing down the HBase flushes actually improved overall system throughput for HBase on MapR-FS.
  • We were selling our Hadoop RDBMS as a traditional software product and there were several prospects who were interested because we supported MapR since MapR has a strong reputation for performance. I cannot comment on whether those deals were closed or not.
We supported all three Hadoop vendors with our Hadoop RDBMS product. Here's how I see the commercial Hadoop distribution world. If you need raw performance and don't mind proprietary technology, go with MapR. If you care about the most pure open source, go with Hortonworks. If you like web UI's and ease of use, go with Cloudera.
If you need Hadoop and just need raw speed for I/O and have a Hadoop savvy group of engineers who don't need/like web UIs, then MapR is a great fit for you. If you are new to Hadoop or have DevOps folks that are not Hadoop gurus, choosing MapR as your Hadoop vendor will have a steeper learning curve as you will need to do more training and build more admin consoles for them.