Apache Spark is a multi-language engine for executing data engineering, data science, and machine learning on single-node machines or clusters.
N/A
Oracle Database
Score 8.0 out of 10
N/A
Oracle Database, currently in edition 23ai, is a converged, multimodel database management system. It is designed to simplify development for AI, microservices, graph, document, spatial, and relational applications.
Well suited: To most of the local run of datasets and non-prod systems - scalability is not a problem at all. Including data from multiple types of data sources is an added advantage. MLlib is a decently nice built-in library that can be used for most of the ML tasks. Less appropriate: We had to work on a RecSys where the music dataset that we used was around 300+Gb in size. We faced memory-based issues. Few times we also got memory errors. Also the MLlib library does not have support for advanced analytics and deep-learning frameworks support. Understanding the internals of the working of Apache Spark for beginners is highly not possible.
I believe Oracle Database is still the best RDBMS database which is the database to consider for OLTP applications and for Adhoc requests. They are good in Datawarehousing in certain aspects but not the best. Oracle is also a great database for scaling up with their Clusterware solution which also makes the database highly available with services moving to the live instance without much trouble.
There is a lot of sunk cost in a product like Oracle 12c. It is doing a great job, it would not provide us much benefit to switch to another product even if it did the same thing due to the work involved in making such a switch. It would not be cost effective.
If the team looking to use Apache Spark is not used to debug and tweak settings for jobs to ensure maximum optimizations, it can be frustrating. However, the documentation and the support of the community on the internet can help resolve most issues. Moreover, it is highly configurable and it integrates with different tools (eg: it can be used by dbt core), which increase the scenarios where it can be used
Many of the powerful options can be auto-configured but there are still many things to take into account at the moment of installing and configuring an Oracle Database, compared with SQL Server or other databases. At the same time, that extra complexity allows for detailed configuration and guarantees performance, scalability, availability and security.
1. It integrates very well with scala or python. 2. It's very easy to understand SQL interoperability. 3. Apache is way faster than the other competitive technologies. 4. The support from the Apache community is very huge for Spark. 5. Execution times are faster as compared to others. 6. There are a large number of forums available for Apache Spark. 7. The code availability for Apache Spark is simpler and easy to gain access to. 8. Many organizations use Apache Spark, so many solutions are available for existing applications.
1. I have very good experience with Oracle Database support team. Oracle support team has pool of talented Oracle Analyst resources in different regions. To name a few regions - EMEA, Asia, USA(EST, MST, PST), Australia. Their support staffs are very supportive, well trained, and customer focused. Whenever I open Oracle Sev1 SR(service request), I always get prompt update on my case timely. 2. Oracle has zoom call and chat session option linked to Oracle SR. Whenever you are in Oracle portal - you can chat with the Oracle Analyst who is working on your case. You can request for Oracle zoom call thru which you can share the your problem server screen in no time. This is very nice as it saves lot of time and energy in case you have to follow up with oracle support for your case. 3.Oracle has excellent knowledge base in which all the customer databases critical problems and their solutions are well documented. It is very easy to follow without consulting to support team at first.
Overall the implementation went very well and after that everything came out as expected - in terms of performance and scalability. People should always install and upgrade a stable version for production with the latest patch set updates, test properly as much as possible, and should have a backup plan if anything unexpected happens
Spark in comparison to similar technologies ends up being a one stop shop. You can achieve so much with this one framework instead of having to stitch and weave multiple technologies from the Hadoop stack, all while getting incredibility performance, minimal boilerplate, and getting the ability to write your application in the language of your choosing.
Oracle is more of an enterprise-level database than Access and SAP Adaptive Server Enterprise isn't getting developed much (some people wonder how close it is to end of life) but SQL Server is miles ahead of Oracle IMO in terms of user experience and comparable in terms of performance AFAIK. As stated, a vendor forced our hand to use Oracle so we did not have a choice. If you are looking for help with an issue you are having, there are lots of SQL Server articles, etc. on the web and the community of SQL Server developers and DBA's is very strong and supportive. Oracle's help on the web is much more limited and often has an attitude that goes with it of superiority and lacking in compassion, IMO. For instance, check out the Ask Tom Oracle blog - a world of difference. If you choose Oracle, go into it with eyes wide open.
Oracle Database 12c has had a very positive impact on our ability to build strong and robust custom applications in house without the need to come up with our own methods of data storage and management.
Oracle Database 12c has the strongest user interface of any database I have worked with and continuously is improving its strength with the addition of support for JSON and XML type objects in the database.
Oracle Database 12c is sometimes very heavy and DBA intensive, but the benefits far outweigh the costs, which we need to spend on DBA support for enabling security and access features.