Overall Satisfaction with Apache Spark
Apache Spark is being used by the whole organization. It helps us a lot in the transmission of data, as it is 100 times faster than Hadoop MapReduce in memory and 10 times faster in disk, as we work with Java this application. It allows native links for Java programming languages, and as it is compatible with SQL, is completely adapted to the needs of our organization, because of the large amount of information that we use. We highly prefer Apache Spark since it supports in-memory processing to increase performance of big data analysis applications.
- It performs a conventional disk-based process when the data sets are too large to fit into memory, which is very useful because, regardless of the size of the data, it is always possible to store them.
- It has great speed and ability to join multiple types of databases and run different types of analysis applications. This functionality is super useful as it reduces work times
- Increase the information and trainings that come with the application, especially for debugging since the process is difficult to understand.
- It should be more attentive to users and make tutorials, to reduce the learning curve.
- There should be more grouping algorithms.
- It has had a very positive impact, as it helps reduce the data processing time and thus helps us achieve our goals much faster.
- Being easy to use, it allows us to adapt to the tool much faster than with others, which in turn allows us to access various data sources such as Hadoop, Apache Mesos, Kubernetes, independently or in the cloud. This makes it very useful.
- It was very easy for me to use Apache Spark and learn it since I come from a background of Java and SQL, and it shares those basic principles and uses a very similar logic.
I prefer Apache Spark compared to Hadoop, since in my experience Spark has more usability and comes equipped with simple APIs for Scala, Python, Java and Spark SQL, as well as provides feedback in REPL format on the commands. At the same time, Apache Spark seems to have the best performance in the processing of large data that works in memory and, therefore, more processes can be downloaded on Spark than on Hadoop, despite the fact that Hadoop is also a very useful tool.