A great tool for mid-size organizations, that would become a great tool for mid-big ones
Use Cases and Deployment Scope
Azure HDInsight that's the main ETL and Data Processing platform of our company. It hosts all processes that generated the data we provide for our costumers, managed by ADF pipelines.
Pros
- Integration with Azure Datafactory
- Integration with Azure Management API
- Easy to deploy from powershell, API, ADFv2 and other Azure Resources
- Cheap when creating and deleting dynamically on demand clusters
- A high level of documentation
- Some diagnostic tools
Cons
- Instable for months, with freaky problems
- By default generates huge logs that finally break the cluster
- Difficult to control its resources escalation
- Difficult to plan, monitor and limit its costs
- On-Demand clusters are very unstable and untrustful
- Cluster creation process fails often
- Cost is huge and difficult to control / limit
- Dead weight (and cost) of some products you will never use into the cluster (we just needed Spark and Hadoop)
- Poor support team
- Outdated software in the clusters (almost out of support Python versions)
- Almost impossible to customize and adapt for our specific needs
Likelihood to Recommend
Well suited:
A tiny-mid sized company with no immediate plans of growing the volume of their data processing, that can afford long response times from support.
Also it helps if you are not prone to put your hands on Linux and Spark configuration. In fact, it can make things go really faster if you also work with the bundle-in Jupyter.
And, if you need to perform some diagnostics and / or administrative tasks, that's full of tools to find an understand the Root Cause. Ideal for non experts.
Less appropriate:
Big Data company, intense on demand cluster creation, mission critical, costs reduction, latest versions of libraries required, sophisticate customizations required.

