Well suited: To most of the local run of datasets and non-prod systems - scalability is not a problem at all. Including data from multiple types of data sources is an added advantage. MLlib is a decently nice built-in library that can be used for most of the ML tasks. Less appropriate: We had to work on a RecSys where the music dataset that we used was around 300+Gb in size. We faced memory-based issues. Few times we also got memory errors. Also the MLlib library does not have support for advanced analytics and deep-learning frameworks support. Understanding the internals of the working of Apache Spark for beginners is highly not possible.
I think it's a great product. We apply Oracle GoldenGate to several use cases in our organization. 1. Business Continuity Planning, 2. Query Offloading through data replication to a reporting instance of our data, 3. looking into data transformations to help support various queries for different teams within the business.
If the team looking to use Apache Spark is not used to debug and tweak settings for jobs to ensure maximum optimizations, it can be frustrating. However, the documentation and the support of the community on the internet can help resolve most issues. Moreover, it is highly configurable and it integrates with different tools (eg: it can be used by dbt core), which increase the scenarios where it can be used
Once set up, it's very easy to use and keep running, it's getting to that point that can make it cumbersome to some. Also, depending on the data that you want to replicate, the configuration files can become quite cumbersome to maintain. Learning curve can be high for some who are not as experienced with databases and transactions.
1. It integrates very well with scala or python. 2. It's very easy to understand SQL interoperability. 3. Apache is way faster than the other competitive technologies. 4. The support from the Apache community is very huge for Spark. 5. Execution times are faster as compared to others. 6. There are a large number of forums available for Apache Spark. 7. The code availability for Apache Spark is simpler and easy to gain access to. 8. Many organizations use Apache Spark, so many solutions are available for existing applications.
Oracle Support for Oracle GoldenGate has been quite responsive and quite helpful in the few situations where we've needed it. Furthermore, the documentation on Oracle GoldenGate is so good that we often do not need to contact support with issues as the fix is already documented and able to be run by us without needing to open a ticket.
We've had Oracle consultants come as well for training days to talk about new features, parts of Oracle GoldenGate we may not be using and things of that nature. The consultants they send are great as they're very knowledgeable about all things Oracle GoldenGate and great resources for any questions or concerns you may have with the product.
We used Oracle University for our Oracle Golden Gate Training and it was top notch. We were able to turn our whole DBA team to Oracle GoldenGate newbies to Oracle GoldenGate troubleshooting experts in a matter of a few days, while this obviously did not come cheap, the company felt that it was worth the investment.
If Oracle GoldenGate is new to your organization, expose as many DBAs as possible to it. Having your whole team fluent in it will overcome early operational hurdles and allow it to more quickly become an accepted and supported part of your supported platform for your team that will enable the business to use it to its fullest.
Spark in comparison to similar technologies ends up being a one stop shop. You can achieve so much with this one framework instead of having to stitch and weave multiple technologies from the Hadoop stack, all while getting incredibility performance, minimal boilerplate, and getting the ability to write your application in the language of your choosing.
We use Oracle Data Guard as a backup tool, but not for data replication. Data Guard is not suited for real-time data replication in our non-normalized reporting database nor for the database we are using for our upgrade project, as Data Guard is not able to transform data and is not able to synchronize data into different schemas, which is necessary for our project. Additionally, our project database is on Oracle 12g not 11i: I am not 100% sure Data Guard is able to replicate from 11i to 12g
Have never had any issues with scaling Oracle GoldenGate itself, however Oracle GoldenGate Monitor does have scaling issues, but with Oracle GoldenGate now able to be monitored by Oracle Enterprise Manager, this is no longer an issue, in my opinion.
In earlier versions, DDL support was limited as well as the need of primary key constraints in the source tables. This made me create partitions, sub-partitions, truncatations and perform other operations upon they are performed in source systems and I need to discuss with source system administrators and need to convince them to let them create primary keys for replicated tables.
But both issues are solved now.
Installation is straightforward, easy.
Deployed everything within Oracle Data Integrator.
Developing 1000 of ODI interfaces for loading into Operational Data Store took not more than 100 man/days. But, adding them to Golden Gate is taking not more than 5 man/days.
Management Pack and VeriData are additional packs for your management and data verification needs.