Apache Spark vs. Cloudera Manager

Cloudera Manager

Overview
Product	Rating	Most Used By	Product Summary	Starting Price
Apache Spark	Score 8.9 out of 10	N/A	Apache Spark is a multi-language engine for executing data engineering, data science, and machine learning on single-node machines or clusters.	N/A
Cloudera Manager	Score 9.9 out of 10	N/A	Cloudera Manager is a management application for Apache Hadoop and the enterprise data hub, from Cloudera. Its automated wizards let users quickly deploy a cluster, no matter what the scale or the deployment environment, complete with intelligent, system-based default settings.	$0.04 Hourly rate

Pricing

Apache Spark

Cloudera Manager

Editions & Modules

No answers on this topic

Data Hub: $0.04/CCU
Hourly rate
Data Engineering: $0.07/CCU
Hourly rate
Data Warehouse: $0.07/CCU
Hourly rate
Operational Database: $0.08/CCU
Hourly rate
Flow Management on Data Hub: $0.15/CCU
Hourly rate
Machine Learning: $0.17/CCU
Hourly rate
DataFlow: $0.30/CCU
Hourly rate

Offerings

Pricing Offerings
Apache Spark	Cloudera Manager
Free Trial
No	No
Free/Freemium Version
No	No
Premium Consulting/Integration Services
No	No

Entry-level Setup Fee

No setup fee

Additional Details

—

Pricing is per Cloudera Compute Unit (CCU) which is a combination of Core and Memory. CCU prices shown for each service are estimates and may vary depending on actual instance types. The prices reflected do not include infrastructure cost, networking costs, and other related costs which will vary depending on the services you choose and your cloud service provider.

More Pricing Information

Community Pulse
	Apache Spark	Cloudera Manager
Considered Both Products	Apache Spark Ananth Gouri Assistant Professor Chose Apache Spark We used Surprise Kit for one of the other research works. It is more fine-tuned to Recommendation systems and their algorithms. Apache Spark has MLlib for majority of ML problems. Where as software like Surprse Kit - it suitable for a specific task of Recommendations only. Incentivized Helpful? Riyaz Khan Staff Engineer Chose Apache Spark Apache Spark is a fast-processing in-memory computing framework. It is 10 times faster than Apache Hadoop. Earlier we were using Apache Hadoop for processing data on the disk but now we are shifted to Apache Spark because of its in-memory computation capability. Also in SAP … Incentivized Helpful? Steven Li Senior Software Developer (Consultant) Chose Apache Spark Other teams used to work on Apache Hadoop but our team started with Apache Spark directly. Incentivized Helpful? Verified User Anonymous Chose Apache Spark There are a few alternatives that can do the same transformation and aggregation like Apache Spark can do but most of them are not able to perform parallel computation. For example, pandas is a really good tool to do that but not parallelized; However, there are some tools that … Incentivized Helpful? Surendranatha Reddy Chappidi Senior Data Engineer Chose Apache Spark Apache Spark works in distributed mode using cluster Informatica and Datastage cannot scale horizontally We can write custom code in spark, whereas in Datastage and Informatica we can only choose the different features proivided already. Incentivized Helpful? Verified User Anonymous Chose Apache Spark Apache Spark has much more better performance and features if we compare with Hive or map/reduce kind of solutions. Spark has many other features for machine learning, streaming. Incentivized Helpful? Chetan Munegowda Software Engineer Chose Apache Spark Spark is simply awesome to work on with any data sets and also has an in-memory database which makes it very flexible. Incentivized Helpful? YM Yogesh Mhasde Technical Manager Chose Apache Spark 1. Apache Spark is almost 100 % faster than Hadoop. 2. Apache Spark is more stable than Amazon EMR. 3. The end to end distributed machine library is more robust in Apache Spark. Incentivized Helpful? Verified User Anonymous Chose Apache Spark Databricks uses Spark as a foundation, and is also a great platform. It does bring several add-ons, which we did not feel needed by the time we evaluated - and haven't needed since then. One interesting plus in our opinion was the engineering support, which is great depending … Incentivized Helpful? Verified User Anonymous Chose Apache Spark It is easy to learn, read and to maintain. It brings the best of the Ruby on Rails framework from Java that helps to create a web service so easily. Communication is one of the most distinctive features of Apache Spark compared to alternative products. You are able to … Incentivized Helpful? SS Shiv Shivakumar Acquisitions Leader Chose Apache Spark We evaluated SAS alongside with Apache Spark but during the course of proof of concept found that Apache Spark was able to support the hadoop eco-system and hadoop file system much better. It was much faster at that time while having the ability to process data quickly for the … Incentivized Helpful? Carla Borges Consultor Tecnico - Java Developer and Php Developer. Chose Apache Spark I prefer Apache Spark compared to Hadoop, since in my experience Spark has more usability and comes equipped with simple APIs for Scala, Python, Java and Spark SQL, as well as provides feedback in REPL format on the commands. At the same time, Apache Spark seems to have the … Incentivized Helpful? Nitin Pasumarthy Software Engineer Chose Apache Spark All the above systems work quite well on big data transformations whereas Spark really shines with its bigger API support and its ability to read from and write to multiple data sources. Using Spark one can easily switch between declarative versus imperative versus functional … Incentivized Helpful? Kartik Chavan Data Analyst Chose Apache Spark Even with Python, MapReduce is lengthy coding. Combination of Python with Apache Spark will not only shorten the code, but it will effectively increase the speed of algorithms. Occasionally, I use MapReduce, but Apache Spark will replace MapReduce very soon. It has many … Incentivized Helpful? Anson Abraham Data Czar Chose Apache Spark vs MapRedce, it was faster and easier to manage. Especially for Machine Learning, where MapReduce is lacking. Also Apache Storm was slower and didn't scale as much as Spark does. Spark elasticity was easier to apply compared to storm and MapReduce. managing resources for … Incentivized Helpful? Verified User Anonymous Chose Apache Spark We specifically choose Spark over MapReduce to make the cluster processing faster Incentivized Helpful? Verified User Anonymous Chose Apache Spark Spark in comparison to similar technologies ends up being a one stop shop. You can achieve so much with this one framework instead of having to stitch and weave multiple technologies from the Hadoop stack, all while getting incredibility performance, minimal boilerplate, and … Incentivized Helpful? Kamesh Emani Software Developer Intern Chose Apache Spark Apache Pig and Apache Hive provide most of the things spark provide but apache spark has more features like actions and transformations which are easy to code. Spark uses optimization technique as we can select driver program and manipulate DAG (Directed Acyclic Graph) Python … Incentivized Helpful? Verified User Anonymous Chose Apache Spark There are a few newer frameworks for general processing like Flink, Beam, frameworks for streaming like Samza and Storm, and traditional Map-Reduce. I think Spark is at a sweet spot where its clearly better than Map-Reduce for many workflows yet has gotten a good amount of … Incentivized Helpful? Jordan Moore Staff Consultant Chose Apache Spark Spark has primarily replaced my use of writing pure Hadoop MapReduce or Apache Pig jobs for processing data. I like the fact that I can alternate between the main programming languages that I know - Java and Python - and use those to learn the Scala API. Spark also can be … Incentivized Helpful?	Cloudera Manager Ethan Tran Senior Systems Engineer Chose Cloudera Manager I have not used any competitors, such as Hortonworks, because Cloudera Manager just works and meets all my customer's needs. I only have deployed hadoop using command line, which is not easy to use and manage. Incentivized Helpful?

Best Alternatives
	Apache Spark	Cloudera Manager
Small Businesses	No answers on this topic	No answers on this topic
Medium-sized Companies	Cloudera Manager Score 9.9 out of 10	Apache Spark Score 8.9 out of 10
Enterprises	IBM Analytics Engine Score 8.6 out of 10	IBM Analytics Engine Score 8.6 out of 10
All Alternatives	View all alternatives	View all alternatives

User Ratings
	Apache Spark	Cloudera Manager
Likelihood to Recommend	9.0 (0 ratings)	8.5 (0 ratings)
Likelihood to Renew	10.0 (0 ratings)	8.5 (0 ratings)
Usability	8.0 (0 ratings)	- (0 ratings)
Support Rating	8.7 (0 ratings)	- (0 ratings)

User Testimonials
	Apache Spark	Cloudera Manager
Likelihood to Recommend	Apache Spark has rich APIs for regular data transformations or for ML workloads or for graph workloads, whereas other systems may not such a wide range of support. Choose it when you need to perform data transformations for big data as offline jobs, whereas use MongoDB-like distributed database systems for more realtime queries. Incentivized Nitin Pasumarthy Software Engineer Read full review	Cloudera Manager is well suited for environments and deployments where the administrator user base is not well versed in the Apache Hadoop ecosystem or the Linux command line interface. Incentivized Charles Inglese Senior System Engineer Read full review
Pros	It performs a conventional disk-based process when the data sets are too large to fit into memory, which is very useful because, regardless of the size of the data, it is always possible to store them. It has great speed and ability to join multiple types of databases and run different types of analysis applications. This functionality is super useful as it reduces work times Apache Spark uses the data storage model of Hadoop and can be integrated with other big data frameworks such as HBase, MongoDB, and Cassandra. This is very useful because it is compatible with multiple frameworks that the company has, and thus allows us to unify all the processes. Incentivized Carla Borges Consultor Tecnico - Java Developer and Php Developer. Read full review	Cloudera Manager has an easy to use web GUI. You can start and stop cluster and services. It will start and stop services in a cluster in the right order. You can monitor the cluster, services, and physical host hardware as well. Cloudera Manager has an easy to use API that allows us to create scripts to automate deployment process. Cloudera Manager has an option to add additional services that you could manage via the web GUI. Incentivized Ethan Tran Senior Systems Engineer Read full review
Cons	Memory management. Very weak on that. PySpark not as robust as scala with spark. spark master HA is needed. Not as HA as it should be. Locality should not be a necessity, but does help improvement. But would prefer no locality Incentivized Anson Abraham Data Czar Read full review	Support for third-party Python APIs within Cloudera Manager extension framework Providing more reporting/logging functionality as part of the open source distribution Support for the latest RHEL versions sooner in the release lifecycle Incentivized Charles Inglese Senior System Engineer Read full review
Likelihood to Renew	Capacity of computing data in cluster and fast speed. Steven Li Senior Software Developer (Consultant) Read full review	It meets all my customer's needs. Incentivized Ethan Tran Senior Systems Engineer Read full review
Usability	If the team looking to use Apache Spark is not used to debug and tweak settings for jobs to ensure maximum optimizations, it can be frustrating. However, the documentation and the support of the community on the internet can help resolve most issues. Moreover, it is highly configurable and it integrates with different tools (eg: it can be used by dbt core), which increase the scenarios where it can be used Incentivized Verified User Anonymous Read full review	No answers on this topic
Support Rating	1. It integrates very well with scala or python. 2. It's very easy to understand SQL interoperability. 3. Apache is way faster than the other competitive technologies. 4. The support from the Apache community is very huge for Spark. 5. Execution times are faster as compared to others. 6. There are a large number of forums available for Apache Spark. 7. The code availability for Apache Spark is simpler and easy to gain access to. 8. Many organizations use Apache Spark, so many solutions are available for existing applications. YM Yogesh Mhasde Technical Manager Read full review	No answers on this topic
Alternatives Considered	We used Surprise Kit for one of the other research works. It is more fine-tuned to Recommendation systems and their algorithms. Apache Spark has MLlib for majority of ML problems. Where as software like Surprse Kit - it suitable for a specific task of Recommendations only Incentivized Ananth Gouri Assistant Professor Read full review	I have not used any competitors, such as Hortonworks, because Cloudera Manager just works and meets all my customer's needs. I only have deployed Hadoop using command line, which is not easy to use and manage. Incentivized Ethan Tran Senior Systems Engineer Read full review
Return on Investment	Faster turn around on feature development, we have seen a noticeable improvement in our agile development since using Spark. Easy adoption, having multiple departments use the same underlying technology even if the use cases are very different allows for more commonality amongst applications which definitely makes the operations team happy. Performance, we have been able to make some applications run over 20x faster since switching to Spark. This has saved us time, headaches, and operating costs. Incentivized Verified User Anonymous Read full review	Cloudera Manager has allowed our organization to deploy Apache Hadoop to operations quicker and with less training versus using the command line exclusively. Increased employee efficiency. Increased product adoption. Incentivized Charles Inglese Senior System Engineer Read full review
ScreenShots