IBM offers Db2 Big SQL, an enterprise grade hybrid ANSI-compliant SQL on Hadoop engine, delivering massively parallel processing (MPP) and advanced data query. Big SQL offers a single database connection or query for disparate sources such as HDFS, RDMS, NoSQL databases, object stores and WebHDFS.
Well suited: To most of the local run of datasets and non-prod systems - scalability is not a problem at all. Including data from multiple types of data sources is an added advantage. MLlib is a decently nice built-in library that can be used for most of the ML tasks. Less appropriate: We had to work on a RecSys where the music dataset that we used was around 300+Gb in size. We faced memory-based issues. Few times we also got memory errors. Also the MLlib library does not have support for advanced analytics and deep-learning frameworks support. Understanding the internals of the working of Apache Spark for beginners is highly not possible.
My recommendation obviously would depend on the application. But I think given the right requirements, IBM DB2 Big SQL is definitely a contender for a database platform. Especially when disparate data and multiple data stores are involved. I like the fact I can use the product to federate my data and make it look like it's all in one place. The engine is high performance and if you desire to use Hadoop, this could be your platform.
If the team looking to use Apache Spark is not used to debug and tweak settings for jobs to ensure maximum optimizations, it can be frustrating. However, the documentation and the support of the community on the internet can help resolve most issues. Moreover, it is highly configurable and it integrates with different tools (eg: it can be used by dbt core), which increase the scenarios where it can be used
IBM DB2 is a solid service but hasn't seen much innovation over the past decade. It gets the job done and supports our IT operations across digital so it is fair.
1. It integrates very well with scala or python. 2. It's very easy to understand SQL interoperability. 3. Apache is way faster than the other competitive technologies. 4. The support from the Apache community is very huge for Spark. 5. Execution times are faster as compared to others. 6. There are a large number of forums available for Apache Spark. 7. The code availability for Apache Spark is simpler and easy to gain access to. 8. Many organizations use Apache Spark, so many solutions are available for existing applications.
IBM did a good job of supporting us during our evaluation and proof of concept. They were able to provide all necessary guidance, answer questions, help us architect it, etc. We were pleased with the support provided by the vendor. I will caveat and say this support was all before the sale, however, we have a ton of IBM products and they provide the same high level of support for all of them. I didn't see this being any different. I give IBM support two thumbs up!
Spark in comparison to similar technologies ends up being a one stop shop. You can achieve so much with this one framework instead of having to stitch and weave multiple technologies from the Hadoop stack, all while getting incredibility performance, minimal boilerplate, and getting the ability to write your application in the language of your choosing.
MS SQL Server was ruled out given we didn't feel we could collapse environments. We thought of MS-SQL as more of a one for one replacement for Sybase ASE, i.e., server for server. SAP HANA was evaluated and given a big thumbs up but was rejected because the SQL would have to be rewritten at the time (now they have an accelerator so you don't have to). Also, there was a very low adoption rate within the enterprise. IBM DB2 Big SQL was not selected even though technically it achieved high scores, because we could not find readily available talent and low adoption rate within the enterprise (basically no adoption at the time). We ended up selecting Exadata because of the high adoption rate within the enterprise even though technically HANA and Big SQL were superior in our evaluations.