1. Apache Spark is almost 100 % faster than Hadoop. 2. Apache Spark is more stable than Amazon EMR. 3. The end to end distributed machine library is more robust in Apache Spark.
Consultor Tecnico - Java Developer and Php Developer.
Chose Apache Spark
I prefer Apache Spark compared to Hadoop, since in my experience Spark has more usability and comes equipped with simple APIs for Scala, Python, Java and Spark SQL, as well as provides feedback in REPL format on the commands. At the same time, Apache Spark seems to have the …
Message brokering across different systems, with transactionality and the ability to have fine tuned control over what happens using Java (or other languages), instead of a heavy, proprietary languages. One situation that it doesn't fit very well (as far as I have experienced) is when your workflow requires significant data mapping. While possible when using Java tooling, some other visual data mapping tools in other integration frameworks are easier to work with.
Well suited: To most of the local run of datasets and non-prod systems - scalability is not a problem at all. Including data from multiple types of data sources is an added advantage. MLlib is a decently nice built-in library that can be used for most of the ML tasks. Less appropriate: We had to work on a RecSys where the music dataset that we used was around 300+Gb in size. We faced memory-based issues. Few times we also got memory errors. Also the MLlib library does not have support for advanced analytics and deep-learning frameworks support. Understanding the internals of the working of Apache Spark for beginners is highly not possible.
Camel has an easy learning curve. It is fairly well documented and there are about 5-6 books on Camel.
There is a large user group and blogs devoted to all things Camel and the developers of Camel provide quick answers and have also been very quick to patch Camel, when bugs are reported.
Camel integrates well with well known frameworks like Spring, and other middleware products like Apache Karaf and Servicemix.
There are over 150 components for the Camel framework that help integrate with diverse software platforms.
If the team looking to use Apache Spark is not used to debug and tweak settings for jobs to ensure maximum optimizations, it can be frustrating. However, the documentation and the support of the community on the internet can help resolve most issues. Moreover, it is highly configurable and it integrates with different tools (eg: it can be used by dbt core), which increase the scenarios where it can be used
1. It integrates very well with scala or python. 2. It's very easy to understand SQL interoperability. 3. Apache is way faster than the other competitive technologies. 4. The support from the Apache community is very huge for Spark. 5. Execution times are faster as compared to others. 6. There are a large number of forums available for Apache Spark. 7. The code availability for Apache Spark is simpler and easy to gain access to. 8. Many organizations use Apache Spark, so many solutions are available for existing applications.
If you are looking for a Java-based open source low cost equivalent to webMethods or Azure Logic Apps, Apache Camel is an excellent choice as it is mature and widely deployed, and included in many vendored Java application servers too such as Redhat JBoss EAP. Apache Camel is lacking on the GUI tooling side compared to commercial products such as webMethods or Azure Logic Apps.
Spark in comparison to similar technologies ends up being a one stop shop. You can achieve so much with this one framework instead of having to stitch and weave multiple technologies from the Hadoop stack, all while getting incredibility performance, minimal boilerplate, and getting the ability to write your application in the language of your choosing.
Very fast time to market in that so many components are available to use immediately.
Error handling mechanisms and patterns of practice are robust and easy to use which in turn has made our application more robust from the start, so fewer bugs.
However, testing and debugging routes is more challenging than working is standard Java so that takes more time (less time than writing the components from scratch).
Most people don't know Camel coming in and many junior developers find it overwhelming and are not enthusiastic to learn it. So finding people that want to develop/maintain it is a challenge.