A great tool for mid-size organizations, that would become a great tool for mid-big ones
Use Cases and Deployment Scope
Pros
- Integration with Azure Datafactory
- Integration with Azure Management API
- Easy to deploy from powershell, API, ADFv2 and other Azure Resources
- Cheap when creating and deleting dynamically on demand clusters
- A high level of documentation
- Some diagnostic tools
Cons
- Instable for months, with freaky problems
- By default generates huge logs that finally break the cluster
- Difficult to control its resources escalation
- Difficult to plan, monitor and limit its costs
- On-Demand clusters are very unstable and untrustful
- Cluster creation process fails often
- Cost is huge and difficult to control / limit
- Dead weight (and cost) of some products you will never use into the cluster (we just needed Spark and Hadoop)
- Poor support team
- Outdated software in the clusters (almost out of support Python versions)
- Almost impossible to customize and adapt for our specific needs
Return on Investment
- When our business started, it was a great value for a company with no tech experts
- When business grow and data volumes also, problems started to arise: lots of on-demand clusters creations failed, cluster internal cluster crashed due to logs size (no way to limit them)
- When business got really big, after having months of outage, we have to start looking for a new platform. It became ineffective in ROI.

