Item: Apache Spark
Rating: 10
Author: Carla Borges

Use Cases and Deployment Scope

Apache Spark is being used by the whole organization. It helps us a lot in the transmission of data, as it is 100 times faster than Hadoop MapReduce in memory and 10 times faster in disk, as we work with Java this application. It allows native links for Java programming languages, and as it is compatible with SQL, is completely adapted to the needs of our organization, because of the large amount of information that we use. We highly prefer Apache Spark since it supports in-memory processing to increase performance of big data analysis applications.

Pros and Cons

It performs a conventional disk-based process when the data sets are too large to fit into memory, which is very useful because, regardless of the size of the data, it is always possible to store them.
It has great speed and ability to join multiple types of databases and run different types of analysis applications. This functionality is super useful as it reduces work times
Apache Spark uses the data storage model of Hadoop and can be integrated with other big data frameworks such as HBase, MongoDB, and Cassandra. This is very useful because it is compatible with multiple frameworks that the company has, and thus allows us to unify all the processes.

Increase the information and trainings that come with the application, especially for debugging since the process is difficult to understand.
It should be more attentive to users and make tutorials, to reduce the learning curve.
There should be more grouping algorithms.

Return on Investment

It has had a very positive impact, as it helps reduce the data processing time and thus helps us achieve our goals much faster.
Being easy to use, it allows us to adapt to the tool much faster than with others, which in turn allows us to access various data sources such as Hadoop, Apache Mesos, Kubernetes, independently or in the cloud. This makes it very useful.
It was very easy for me to use Apache Spark and learn it since I come from a background of Java and SQL, and it shares those basic principles and uses a very similar logic.

Alternatives Considered

Hadoop

I prefer Apache Spark compared to Hadoop, since in my experience Spark has more usability and comes equipped with simple APIs for Scala, Python, Java and Spark SQL, as well as provides feedback in REPL format on the commands. At the same time, Apache Spark seems to have the best performance in the processing of large data that works in memory and, therefore, more processes can be downloaded on Spark than on Hadoop, despite the fact that Hadoop is also a very useful tool.

Other Software Used

Hadoop, Cassandra, Apache Camel, Apache CloudStack, Apache OpenOffice

Likelihood to Recommend

It is suitable for processing large amounts of data, as it is very easy to use and its syntax is simple and understandable. I also find it useful to use in a variety of applications without the need to integrate many other processing technologies, and it is very fast and has many machine learning algorithms that can be used for data problems. I find it less appropriate for data that is not so large, as it uses too many resources.

Very useful application for Big Data processing and excellent for large volume production workflows

Overall Satisfaction with Apache Spark