Apache Spark is a multi-language engine for executing data engineering, data science, and machine learning on single-node machines or clusters.
N/A
NGINX
Score 9.8 out of 10
Mid-Size Companies (51-1,000 employees)
NGINX, a business unit of F5 Networks, powers over 65% of the world's busiest websites and web applications. NGINX started out as an open source web server and reverse proxy, built to be faster and more efficient than Apache. Over the years, NGINX has built a suite of infrastructure software products o tackle some of the biggest challenges in managing high-transaction applications. NGINX offers a suite of products to form the core of what organizations need to create…
Well suited: To most of the local run of datasets and non-prod systems - scalability is not a problem at all. Including data from multiple types of data sources is an added advantage. MLlib is a decently nice built-in library that can be used for most of the ML tasks. Less appropriate: We had to work on a RecSys where the music dataset that we used was around 300+Gb in size. We faced memory-based issues. Few times we also got memory errors. Also the MLlib library does not have support for advanced analytics and deep-learning frameworks support. Understanding the internals of the working of Apache Spark for beginners is highly not possible.
[NGINX] is very well suited for high performance. I have seen it used on servers with 1k current connections with no issues. Despite seeing it used in many environments I've never seen software developers use it over apache, express, IIS in local dev environments so it may be more difficult to setup. I've also seen it used to load balance again without issues.
Customer support can be strangely condescending, perhaps it's a language issue?
I find it a little weird how the release versions used for Nginx+ aren't the same as for open source version. It can be very confusing to determine the cross-compatibility of modules, etc., because of this.
It seems like some (most?) modules on their own site are ancient and no longer supported, so their documentation in this area needs work.
It's difficult to navigate between nginx.com commercial site and customer support. They need to be integrated together.
I'd love to see more work done on nginx+ monitoring without requiring logging every request. I understand that many statistics can only be derived from logs, but plenty should work without that. Logging is not an option in many environments.
If the team looking to use Apache Spark is not used to debug and tweak settings for jobs to ensure maximum optimizations, it can be frustrating. However, the documentation and the support of the community on the internet can help resolve most issues. Moreover, it is highly configurable and it integrates with different tools (eg: it can be used by dbt core), which increase the scenarios where it can be used
Front end proxy and reverse proxy of Nginx is always useful. I always prefer to Nginx in overall usability when you have application server and database or multiple application servers and single database i.e. clustered application. Nginx provides really good features and flexibility which helps the system administrator in case of troubleshooting and also from the administration perspective. Also, Nginx doesn't delay any request because of internal performance issues.
1. It integrates very well with scala or python. 2. It's very easy to understand SQL interoperability. 3. Apache is way faster than the other competitive technologies. 4. The support from the Apache community is very huge for Spark. 5. Execution times are faster as compared to others. 6. There are a large number of forums available for Apache Spark. 7. The code availability for Apache Spark is simpler and easy to gain access to. 8. Many organizations use Apache Spark, so many solutions are available for existing applications.
Community support is great, and they've also had a presence at conferences. Overall, there is no shortage of documentation and community support. We're currently using it to serve up some WordPress sites, and configuring NGINX for this purpose is well documented.
Spark in comparison to similar technologies ends up being a one stop shop. You can achieve so much with this one framework instead of having to stitch and weave multiple technologies from the Hadoop stack, all while getting incredibility performance, minimal boilerplate, and getting the ability to write your application in the language of your choosing.
We have used Traffic, Apache, Google Cloud Load Balancing and other managed cloud-based load balancers. When it comes to scale and customization nothing beats Nginx. We selected Nginx over the others because
we have a large number of services and we can manage a single Nginx instance for all of them
we have high impact services and Nginx never breaks a sweat under load
individual services have special considerations and Nginx lets us configure each one uniquely
Nginx has decreased the burden of web server administration and maintenance, and we are spending less time on server issues than when we were using Apache.
Nginx has allowed more people in our company to get involved with configuring things on the web server, so there's no longer a single point of failure ("the Apache guy").
Nginx has given us the ability to handle a larger number of requests without scaling up in hardware quite so quickly.