Likelihood to Recommend
Well suited: To most of the local run of datasets and non-prod systems - scalability is not a problem at all. Including data from multiple types of data sources is an added advantage. MLlib is a decently nice built-in library that can be used for most of the ML tasks. Less appropriate: We had to work on a RecSys where the music dataset that we used was around 300+Gb in size. We faced memory-based issues. Few times we also got memory errors. Also the MLlib library does not have support for advanced analytics and deep-learning frameworks support. Understanding the internals of the working of Apache Spark for beginners is highly not possible.
Read full review
In terms of cloud computing, Microsoft Azure is the only comprehensive result the company offers. Regardless of how big or small an organization is, it can make use of this system. As a cyber-security professional, this is your best option for data management. A business that wants to minimize capital expenditures can use Microsoft Azure. Many Microsoft services accept it. People with little or no knowledge of cloud computing may find it impossible. It isn’t the solution for companies that don’t want to risk having only one platform and infrastructure vendor.
Read full review Pros Apache Spark makes processing very large data sets possible. It handles these data sets in a fairly quick manner. Apache Spark does a fairly good job implementing machine learning models for larger data sets. Apache Spark seems to be a rapidly advancing software, with the new features making the software ever more straight-forward to use. Read full review Azure simply provides end to end life cycle. Starting from the development to automated deployment, you will find [a] bunch of options. Custom hook-points allow [integration] on-premise resources as well. Excellent documentation around all the services make it really easy for any novice. Overall support by [the] community and Azure Technical team is exceptional. BOT Services, Computer Vision services, ML frameworks provide excellent results as compare to similar services provided by other giants in the same space. Azure data services provide excellent support to ingest data from different sources, ETL, and consumption of data for BI purpose. Read full review Cons Memory management. Very weak on that. PySpark not as robust as scala with spark. spark master HA is needed. Not as HA as it should be. Locality should not be a necessity, but does help improvement. But would prefer no locality Read full review In our experience, Azure Kubernetes Survice was difficult to set up, which is why we used Kubernetes on top of VMs. Azure REST API is a bit difficult to use, which made it difficult for us to automate our interactions with Azure. Azure's Web UI does a good job of showing metrics on individual VMs, but it would be great if there was a way to show certain metrics from multiple VMs on one dashboard. For example, hard drive usage on our database VMs. Read full review Likelihood to Renew
Capacity of computing data in cluster and fast speed.
Read full review
Moving to Azure was and still is an organizational strategy and not simply changing vendors. Our product roadmap revolved around Azure as we are in the business of humanitarian relief and Azure and Microsoft play an important part in quickly and efficiently serving all of the world. Migration and investment in Azure should be considered as an overall strategy of an organization and communicated companywide.
Read full review Usability
The only thing I dislike about spark's usability is the learning curve, there are many actions and transformations, however, its wide-range of uses for ETL processing, facility to integrate and it's multi-language support make this library a powerhouse for your data science solutions. It has especially aided us with its lightning-fast processing times.
Read full review
Microsoft Azure's overall usability has been better than expected. Often times vendors promise the world, only to leave you with a run-down town. Not the case with our experience. From an implementation perspective, all went perfect, and from the user-facing experience we have had no technical issues, just some learning curve issues that are more about "why" than "how"
Read full review Reliability and Availability
It has proven to be unreliable in our production environment and services become unavailable without proper notification to system administrators
Read full review Support Rating
1. It integrates very well with scala or python. 2. It's very easy to understand SQL interoperability. 3. Apache is way faster than the other competitive technologies. 4. The support from the Apache community is very huge for Spark. 5. Execution times are faster as compared to others. 6. There are a large number of forums available for Apache Spark. 7. The code availability for Apache Spark is simpler and easy to gain access to. 8. Many organizations use Apache Spark, so many solutions are available for existing applications.
Read full review
Support is easy with all the knowledge base articles available for free on the web. Plus, if you have a preferred status you can leverage their concierge support to get rapid response. Sometimes they’ll bounce you around a lot to get you to the right person, but they are quite responsive (especially when you are paying for the service). Many of the older Microsoft skills are also transferable from old-school on-prem to Azure-based virtual interfaces.
Read full review Implementation Rating
As I have mentioned before the issue with my Oracle Mismatch Version issues that have put a delay on moving one of my platforms will justify my 7 rating.
Read full review Alternatives Considered
All the above systems work quite well on big data transformations whereas Spark really shines with its bigger API support and its ability to read from and write to multiple data sources. Using Spark one can easily switch between declarative versus imperative versus functional type programming easily based on the situation. Also it doesn't need special data ingestion or indexing pre-processing like
. Combining it with Jupyter Notebooks (
), one can develop the Spark code in an interactive manner in Scala or Python
Read full review
As I continue to evaluate the "big three" cloud providers for our clients, I make the following distinctions, though this gap continues to close. AWS is more granular, and inherently powerful in the configuration options compared to [Microsoft] Azure. It is a "developer" platform for cloud. However, Azure PowerShell is helping close this gap. Google Cloud is the leading containerization platform, largely thanks to it building kubernetes from the ground up. Azure containerization is getting better at having the same storage/deployment options.
Read full review Return on Investment Faster turn around on feature development, we have seen a noticeable improvement in our agile development since using Spark. Easy adoption, having multiple departments use the same underlying technology even if the use cases are very different allows for more commonality amongst applications which definitely makes the operations team happy. Performance, we have been able to make some applications run over 20x faster since switching to Spark. This has saved us time, headaches, and operating costs. Read full review Brings down Capex to customers. Some of the built-in security features of DDoS Basic protection that comes with VNET on Azure or even WAF on AGW brings huge advantages to customers. Hybrid benefits for those who have software assurance can save even more costs by moving to Azure. Read full review ScreenShots