Apache Spark is a multi-language engine for executing data engineering, data science, and machine learning on single-node machines or clusters.
N/A
Splunk Enterprise
Score 8.5 out of 10
N/A
Splunk is software for searching, monitoring, and analyzing machine-generated big data, via a web-style interface. It captures, indexes and correlates real-time data in a searchable repository from which it can generate graphs, reports, alerts, dashboards and visualizations.
We have also used ELK (Elastic Logstash Kibana) with some benefits, but Splunk is way better than ELK. We also use AWS CloudWatch for Lambdas that are written in AWS. However CloudWatch is not a replacement for Splunk.
Well suited: To most of the local run of datasets and non-prod systems - scalability is not a problem at all. Including data from multiple types of data sources is an added advantage. MLlib is a decently nice built-in library that can be used for most of the ML tasks. Less appropriate: We had to work on a RecSys where the music dataset that we used was around 300+Gb in size. We faced memory-based issues. Few times we also got memory errors. Also the MLlib library does not have support for advanced analytics and deep-learning frameworks support. Understanding the internals of the working of Apache Spark for beginners is highly not possible.
I'm liking the newer products, and I'm looking forward to how they integrate with the overall product when they come together. Just log in and be able to query a large number of systems for similar issues or a unique one. That is a great fit for Splunk Enterprise, looking for a simple case or a simple String or something of that nature across multiple machines. It's a great fit for that to identify issues or particular software, whatever your scenario is, String, to find it across any particular server or group of servers, so that you can update or do a deployment or whatever it is you're looking to do.
We are using Splunk extensively in our projects and we have recently upgraded to Splunk version 6.0 which is quite efficient and giving expected results. We keep track of updates and new features Splunk introduces periodically and try to introduce those features in our day to day activities for improvement in our reporting system and other tasks.
If the team looking to use Apache Spark is not used to debug and tweak settings for jobs to ensure maximum optimizations, it can be frustrating. However, the documentation and the support of the community on the internet can help resolve most issues. Moreover, it is highly configurable and it integrates with different tools (eg: it can be used by dbt core), which increase the scenarios where it can be used
You can literally throw in a single word into Splunk and it will pull back all instances of that word across all of your logs for the time span you select (provided you have permission to see that data). We have several users who have taken a few of the free courses from Splunk that are able to pull data out of it everyday with little help at all.
1. It integrates very well with scala or python. 2. It's very easy to understand SQL interoperability. 3. Apache is way faster than the other competitive technologies. 4. The support from the Apache community is very huge for Spark. 5. Execution times are faster as compared to others. 6. There are a large number of forums available for Apache Spark. 7. The code availability for Apache Spark is simpler and easy to gain access to. 8. Many organizations use Apache Spark, so many solutions are available for existing applications.
Splunk maintains a well resourced support system that has been consistent since we purchased the product. They help out in a timely manner and provide expert level information as needed. We typically open cases online and communicate when possible via e-mail and are able to resolve most issues with that method.
The online course was simple clear and described the main capabilities of the solution. There is also an initial module that can be done for free so anyone can familiarize themselves with the functionality of this solution. On the other hand, however, there could be more free online courses. Maybe even with a certificate, this would broaden the group of people who are familiar with the platform while increasing familiarity with the solution itself.
Spark in comparison to similar technologies ends up being a one stop shop. You can achieve so much with this one framework instead of having to stitch and weave multiple technologies from the Hadoop stack, all while getting incredibility performance, minimal boilerplate, and getting the ability to write your application in the language of your choosing.
A lot of products have natively inside their own dashboards and or their own logging repositories. And each one is difficult to learn or they're too complex or they're not verbose in the sense that they're not easy to mine the data that you're looking for. So that could be anything from the native logging that you find in other Cisco products. It's easier to use Splunk to draw the data that you're looking for as opposed to going to the individual's products themselves to get the logs that you're looking for.
Splunk has allowed developers to diagnose production issues when access of control was taken away from them to be allowed to view items in production environments and I believe that is invaluable.
At times some developers weren't super happy about using it, but it was more of the fact that they were used to having production access and not creating their splunk queries to get information.
Going one place to view logs was very beneficial to have.