Apache Spark vs. IBM watsonx.data

Apache Spark

Apache Spark

165 Reviews and Ratings

IBM watsonx.data

IBM watsonx.data

59 Reviews and Ratings

Overview
Product	Rating	Most Used By	Product Summary	Starting Price
Apache Spark	Score 8.9 out of 10	N/A	Apache Spark is a multi-language engine for executing data engineering, data science, and machine learning on single-node machines or clusters.	N/A
IBM watsonx.data	Score 8.8 out of 10	N/A	Watsonx.data is presented as an open, hybrid and governed data store that makes it possible for enterprises to scale analytics and AI with a fit-for-purpose data store, built on an open lakehouse architecture, supported by querying, governance and open data formats to access and share data.	N/A

Pricing

Apache Spark

IBM watsonx.data

Editions & Modules

No answers on this topic

No answers on this topic

Offerings

Pricing Offerings
Apache Spark	IBM watsonx.data
Free Trial
No	Yes
Free/Freemium Version
No	No
Premium Consulting/Integration Services
No	No

Entry-level Setup Fee

No setup fee

No setup fee

Additional Details

—

—

More Pricing Information

Community Pulse
	Apache Spark	IBM watsonx.data
Considered Both Products	Apache Spark Ananth Gouri Assistant Professor Chose Apache Spark We used Surprise Kit for one of the other research works. It is more fine-tuned to Recommendation systems and their algorithms. Apache Spark has MLlib for majority of ML problems. Where as software like Surprse Kit - it suitable for a specific task of Recommendations only. Incentivized Helpful? Riyaz Khan Staff Engineer Chose Apache Spark Apache Spark is a fast-processing in-memory computing framework. It is 10 times faster than Apache Hadoop. Earlier we were using Apache Hadoop for processing data on the disk but now we are shifted to Apache Spark because of its in-memory computation capability. Also in SAP … Incentivized Helpful? Steven Li Senior Software Developer (Consultant) Chose Apache Spark Other teams used to work on Apache Hadoop but our team started with Apache Spark directly. Incentivized Helpful? Verified User Anonymous Chose Apache Spark There are a few alternatives that can do the same transformation and aggregation like Apache Spark can do but most of them are not able to perform parallel computation. For example, pandas is a really good tool to do that but not parallelized; However, there are some tools that … Incentivized Helpful? Surendranatha Reddy Chappidi Senior Data Engineer Chose Apache Spark Apache Spark works in distributed mode using cluster Informatica and Datastage cannot scale horizontally We can write custom code in spark, whereas in Datastage and Informatica we can only choose the different features proivided already. Incentivized Helpful? Verified User Anonymous Chose Apache Spark Apache Spark has much more better performance and features if we compare with Hive or map/reduce kind of solutions. Spark has many other features for machine learning, streaming. Incentivized Helpful? Chetan Munegowda Software Engineer Chose Apache Spark Spark is simply awesome to work on with any data sets and also has an in-memory database which makes it very flexible. Incentivized Helpful? YM Yogesh Mhasde Technical Manager Chose Apache Spark 1. Apache Spark is almost 100 % faster than Hadoop. 2. Apache Spark is more stable than Amazon EMR. 3. The end to end distributed machine library is more robust in Apache Spark. Incentivized Helpful? Verified User Anonymous Chose Apache Spark Databricks uses Spark as a foundation, and is also a great platform. It does bring several add-ons, which we did not feel needed by the time we evaluated - and haven't needed since then. One interesting plus in our opinion was the engineering support, which is great depending … Incentivized Helpful? Verified User Anonymous Chose Apache Spark It is easy to learn, read and to maintain. It brings the best of the Ruby on Rails framework from Java that helps to create a web service so easily. Communication is one of the most distinctive features of Apache Spark compared to alternative products. You are able to … Incentivized Helpful? SS Shiv Shivakumar Acquisitions Leader Chose Apache Spark We evaluated SAS alongside with Apache Spark but during the course of proof of concept found that Apache Spark was able to support the hadoop eco-system and hadoop file system much better. It was much faster at that time while having the ability to process data quickly for the … Incentivized Helpful? Carla Borges Consultor Tecnico - Java Developer and Php Developer. Chose Apache Spark I prefer Apache Spark compared to Hadoop, since in my experience Spark has more usability and comes equipped with simple APIs for Scala, Python, Java and Spark SQL, as well as provides feedback in REPL format on the commands. At the same time, Apache Spark seems to have the … Incentivized Helpful? Nitin Pasumarthy Software Engineer Chose Apache Spark All the above systems work quite well on big data transformations whereas Spark really shines with its bigger API support and its ability to read from and write to multiple data sources. Using Spark one can easily switch between declarative versus imperative versus functional … Incentivized Helpful? Kartik Chavan Data Analyst Chose Apache Spark Even with Python, MapReduce is lengthy coding. Combination of Python with Apache Spark will not only shorten the code, but it will effectively increase the speed of algorithms. Occasionally, I use MapReduce, but Apache Spark will replace MapReduce very soon. It has many … Incentivized Helpful? Anson Abraham Data Czar Chose Apache Spark vs MapRedce, it was faster and easier to manage. Especially for Machine Learning, where MapReduce is lacking. Also Apache Storm was slower and didn't scale as much as Spark does. Spark elasticity was easier to apply compared to storm and MapReduce. managing resources for … Incentivized Helpful? Verified User Anonymous Chose Apache Spark We specifically choose Spark over MapReduce to make the cluster processing faster Incentivized Helpful? Verified User Anonymous Chose Apache Spark Spark in comparison to similar technologies ends up being a one stop shop. You can achieve so much with this one framework instead of having to stitch and weave multiple technologies from the Hadoop stack, all while getting incredibility performance, minimal boilerplate, and … Incentivized Helpful? Kamesh Emani Software Developer Intern Chose Apache Spark Apache Pig and Apache Hive provide most of the things spark provide but apache spark has more features like actions and transformations which are easy to code. Spark uses optimization technique as we can select driver program and manipulate DAG (Directed Acyclic Graph) Python … Incentivized Helpful? Verified User Anonymous Chose Apache Spark There are a few newer frameworks for general processing like Flink, Beam, frameworks for streaming like Samza and Storm, and traditional Map-Reduce. I think Spark is at a sweet spot where its clearly better than Map-Reduce for many workflows yet has gotten a good amount of … Incentivized Helpful? Jordan Moore Staff Consultant Chose Apache Spark Spark has primarily replaced my use of writing pure Hadoop MapReduce or Apache Pig jobs for processing data. I like the fact that I can alternate between the main programming languages that I know - Java and Python - and use those to learn the Scala API. Spark also can be … Incentivized Helpful?	IBM watsonx.data LG Lalit Goel Deputy Manager Chose IBM watsonx.data It is better then competitors but not always, it depends on how we use in our environment. IBM watsonx.data has strong data governance. Incentivized Helpful? Verified User Anonymous Chose IBM watsonx.data Snowflake is a more mature, simpler to use product but watsonx.data has a more open architecture and it better for hybrid cloud environments. In addition, watsonx.data is part of an entire watsonx platform that offers many advantages over is closest competitors. The single … Incentivized Helpful? VN Victor Ngoma Manager Chose IBM watsonx.data 360 Cloud Accounting Incentivized Helpful? Verified User Anonymous Chose IBM watsonx.data Already using the watsonx.orchestrate, so it's was easier to incorporate this into existing infrastructure. Incentivized Helpful? Verified User Anonymous Chose IBM watsonx.data IBM watsonx.ai and IBM watsonx.governance Incentivized Helpful? Verified User Anonymous Chose IBM watsonx.data The three pair nicely together to create my own RAG solution in a controlled manner. Incentivized Helpful? Murali Madhanagopal Architect Chose IBM watsonx.data watsonx supports open data types and ease of use Incentivized Helpful? Verified User Anonymous Chose IBM watsonx.data We chose IBM watsonx.data for our organization because IBM watsonx.data has Open-source support Incentivized Helpful? Rakesh Kumar Principal Solution Architect Chose IBM watsonx.data IBM watsonx Orchestrate Incentivized Helpful? JM Jim McDonough President Chose IBM watsonx.data IBM watsonx.data stacks up against Snowflake very well. It come in at a less expensive price. Also, you can run IBM watsonx.data on any cloud. or on prem.. Much more flexible. Incentivized Helpful? Verified User Anonymous Chose IBM watsonx.data with iceberg open table format and presto engine the performance and flexibility increased and also with watsonx.ai with GENAI capability which other tools lag as of now. Incentivized Helpful? Verified User Anonymous Chose IBM watsonx.data We use IBM watsonx.data as a unified data platform to integrate and govern data across systems, eliminating silos and improving data quality. Its open lakehouse architecture enables faster, trusted access to data for AI, analytics, and reporting, forming the foundation for … Incentivized Helpful? Verified User Anonymous Chose IBM watsonx.data Salesforce Genie and Snowflake Incentivized Helpful? Prajwal Shetty Senior Cloud Solutions Engineer Chose IBM watsonx.data Oracle really cost effective solution, where it has the support of community, with rich integration of all wide range of oracle products. Amazon sageMaker is another cost effective solution, where is tightly coupled with AWS platform, in terms of performance it copes up really … Incentivized Helpful? Verified User Anonymous Chose IBM watsonx.data IBM watsonx.data integrates well with other IBM services used in our deployment and provides enterprise grade security which is critical for our regulated business Incentivized Helpful? Shrideep Tamboli AI Engineer Chose IBM watsonx.data AstraDB was giving me vector database solutions, Retrieval Augmented Generation features and even Agentic workflows that IBM watsonx.data does not have currently. But the volume of data I've coming everyday and has to deal with everyday, can do anomaly detection just in plain … Incentivized Helpful? Verified User Anonymous Chose IBM watsonx.data Pinecone and IBM watsonx.data (Milvus in our case) both work great as a full-managed cloud-based vector database. We selected IBM watsonx.data because it integrates well with watson.ai and is a little more beginner friendly than pinecone, but I think both are great anyway. Incentivized Helpful? Verified User Anonymous Chose IBM watsonx.data IBM watsonx.data helps in reducing data warehousing costs. IBM AIOps Insights focuses mainly on incident management, while IBM watsonx.data provides a flexible data store. Incentivized Helpful? Verified User Anonymous Chose IBM watsonx.data May be I cannot say why I choose, business preferred to use IBM watsonx.data which is good for me as well to learn. I cannot compare this tool with others because it has unique feature which alteryx or Amazon or Azure dont have. So this tool is going good for us. Incentivized Helpful? Verified User Anonymous Chose IBM watsonx.data I believe DataStax Enterprise is the best in class. There are some things that are different with the schema-less systems but I found DataStax Enterprise easiest to implement while evaluating. The replication is on par or better than others in practice. We are evaluating … Incentivized Helpful? Julian Chultarsky Principal Technologist Chose IBM watsonx.data DataStax Enterprise offered best-in-class write performance and scalability. The customer support team was very helpful in the adoption of new technology. Incentivized Helpful? Verified User Anonymous Chose IBM watsonx.data DataStax has an amazing community built around it and is also Cassandra is an open-source technology. The customer support is quite good compared to other vendors. Though you initially need to spend some hefty amount on infrastructure, in the long run, it makes up for it. We … Incentivized Helpful? Verified User Anonymous Chose IBM watsonx.data We chose datastax because we need a system always available and capable of ingesting a large amount of data per second, even if eventually consistent and with multi data center sync native support. We considered Cloudera as an alternative using Kafka as the ingestion layer but … Incentivized Helpful? Verified User Anonymous Chose IBM watsonx.data Amazon DynamoDB and Datastax Cassandra are similar on masterless architecture and principles, DynamoDB is managed and needs cost analysis. If you need to have better control, Datastax is better. I also did a prototype with Google Spanner in one of the recent innovation days, it … Incentivized Helpful?

Best Alternatives
	Apache Spark	IBM watsonx.data
Small Businesses	No answers on this topic	No answers on this topic
Medium-sized Companies	Cloudera Manager Score 9.9 out of 10	Snowflake Score 8.7 out of 10
Enterprises	IBM Analytics Engine Score 7.1 out of 10	Snowflake Score 8.7 out of 10
All Alternatives	View all alternatives	View all alternatives

User Ratings
	Apache Spark	IBM watsonx.data
Likelihood to Recommend	9.0 (0 ratings)	8.8 (0 ratings)
Likelihood to Renew	10.0 (0 ratings)	7.3 (0 ratings)
Usability	8.0 (0 ratings)	7.9 (0 ratings)
Availability	- (0 ratings)	8.2 (0 ratings)
Performance	- (0 ratings)	8.2 (0 ratings)
Support Rating	8.7 (0 ratings)	9.1 (0 ratings)
Online Training	- (0 ratings)	8.2 (0 ratings)
Implementation Rating	- (0 ratings)	8.2 (0 ratings)
Configurability	- (0 ratings)	8.2 (0 ratings)
Ease of integration	- (0 ratings)	7.3 (0 ratings)
Product Scalability	- (0 ratings)	7.3 (0 ratings)
Vendor post-sale	- (0 ratings)	8.2 (0 ratings)
Vendor pre-sale	- (0 ratings)	8.2 (0 ratings)

User Testimonials
	Apache Spark	IBM watsonx.data
Likelihood to Recommend	Apache Spark has rich APIs for regular data transformations or for ML workloads or for graph workloads, whereas other systems may not such a wide range of support. Choose it when you need to perform data transformations for big data as offline jobs, whereas use MongoDB-like distributed database systems for more realtime queries. Incentivized Nitin Pasumarthy Software Engineer Read full review	Datastax Cassandra is a Java based linearly scalable NoSQL database, best-in-class tunable performance, fault tolerant, distributed, masterless, time series database and has easy-to-use administration and monitoring functionality with opscenter. Configured correctly there is no downtime and no data loss. The documentation is exhaustive, and the community is agile and supportive, and Datastax provides good support. For all these reasons, Datastax Cassandra has become a NoSQL technology of choice for many platforms. However it has some time investment on infrastructure and regular operational tasks, and if you do not have bandwidth for it, a managed NoSQL solution like DynamoDB might be more appropriate. Also if you have search needs on Cassandra and do not have corresponding Spark/Solr setup, Datastax Cassandra might not be ideal for you. Incentivized Verified User Anonymous Read full review
Pros	It performs a conventional disk-based process when the data sets are too large to fit into memory, which is very useful because, regardless of the size of the data, it is always possible to store them. It has great speed and ability to join multiple types of databases and run different types of analysis applications. This functionality is super useful as it reduces work times Apache Spark uses the data storage model of Hadoop and can be integrated with other big data frameworks such as HBase, MongoDB, and Cassandra. This is very useful because it is compatible with multiple frameworks that the company has, and thus allows us to unify all the processes. Incentivized Carla Borges Consultor Tecnico - Java Developer and Php Developer. Read full review	It doesn't just store data but unlocks potential. I am able to analyse a vast amount of information, identify trends, and predict future outcomes. It not only gives me high quality but accessible data as well. It handles missing values, outliers and feature engineering with case. Incentivized Sarthak Chopra Associate Read full review
Cons	Memory management. Very weak on that. PySpark not as robust as scala with spark. spark master HA is needed. Not as HA as it should be. Locality should not be a necessity, but does help improvement. But would prefer no locality Incentivized Anson Abraham Data Czar Read full review	Integration complexity with Security Tools while watsonx.Data is well-suited for native tools, but integration with third-party security tools requires custom connectors or manual ETL pipelines. which leads to an increase in setup time. User interface and query time can be improved. Incentivized Harshal Pachpande Analyst Read full review
Likelihood to Renew	Capacity of computing data in cluster and fast speed. Steven Li Senior Software Developer (Consultant) Read full review	As an open source technology Cassandra can be readily used with or without any commercial support. DataStax provides value-added services and features, and in the end it is up to individual situations to strike a balance between the desirability of such support/service versus the associated cost. Incentivized Robert Xu Director, Database Administration Read full review
Usability	If the team looking to use Apache Spark is not used to debug and tweak settings for jobs to ensure maximum optimizations, it can be frustrating. However, the documentation and the support of the community on the internet can help resolve most issues. Moreover, it is highly configurable and it integrates with different tools (eg: it can be used by dbt core), which increase the scenarios where it can be used Incentivized Verified User Anonymous Read full review	DataStax has a good community built around it and has amazing scalability options. Though the initial setup is a bit costly, in the long run, it makes up for it. It also has powerful monitoring tools and a clean UI. Incentivized Verified User Anonymous Read full review
Reliability and Availability	No answers on this topic	good recovery features Incentivized Murali Madhanagopal Architect Read full review
Performance	No answers on this topic	scalable product Incentivized Murali Madhanagopal Architect Read full review
Support Rating	1. It integrates very well with scala or python. 2. It's very easy to understand SQL interoperability. 3. Apache is way faster than the other competitive technologies. 4. The support from the Apache community is very huge for Spark. 5. Execution times are faster as compared to others. 6. There are a large number of forums available for Apache Spark. 7. The code availability for Apache Spark is simpler and easy to gain access to. 8. Many organizations use Apache Spark, so many solutions are available for existing applications. YM Yogesh Mhasde Technical Manager Read full review	We have had a few situations where we caused an outage or something has gone wrong and we are able to get a support person to offer live help within minutes. The escalation process is excellent - the best I've seen - and the support team is incredibly strong. Outside of emergencies, the team is very helpful with general questions and working through data model exercises and the subscription I believe still comes with some hours to help get the data model reviewed. Verified User Anonymous Read full review
Online Training	No answers on this topic	easy to follow documentation, support is there when needed Incentivized Murali Madhanagopal Architect Read full review
Implementation Rating	No answers on this topic	use saas service Incentivized Murali Madhanagopal Architect Read full review
Alternatives Considered	We used Surprise Kit for one of the other research works. It is more fine-tuned to Recommendation systems and their algorithms. Apache Spark has MLlib for majority of ML problems. Where as software like Surprse Kit - it suitable for a specific task of Recommendations only Incentivized Ananth Gouri Assistant Professor Read full review	I believe DataStax Enterprise is the best in class. There are some things that are different with the schema-less systems but I found DataStax Enterprise easiest to implement while evaluating. The replication is on par or better than others in practice. We are evaluating Astra in our test environment and that has additional benefits we are looking forward to using. Verified User Anonymous Read full review
Scalability	No answers on this topic	cognos integration works great Incentivized Murali Madhanagopal Architect Read full review
Return on Investment	Faster turn around on feature development, we have seen a noticeable improvement in our agile development since using Spark. Easy adoption, having multiple departments use the same underlying technology even if the use cases are very different allows for more commonality amongst applications which definitely makes the operations team happy. Performance, we have been able to make some applications run over 20x faster since switching to Spark. This has saved us time, headaches, and operating costs. Incentivized Verified User Anonymous Read full review	for one automation project, we managed to cut cloud storage costs by a third through IBM watsonx.data's lakehouse optimization data integration projects have had a 20 % reduction in turnaround times. Can only imagine how that will improve with the Claude partnership Incentivized Verified User Anonymous Read full review
ScreenShots