Databricks offers the Databricks Lakehouse Platform (formerly the Unified Analytics Platform), a data science platform and Apache Spark cluster manager. The Databricks Unified Data Service provides a platform for data pipelines, data lakes, and data platforms.
$0.07
Per DBU
Dell PowerScale
Score 10.0 out of 10
N/A
Dell Technologies presents Dell PowerScale (replacing EMC Isilon) as a scale-out NAS solution and server technology that provides the flexibility of a software-defined architecture with accelerated hardware innovations to harness the value of data.
Isilon Systems was acquired by EMC in 2010; some EMC Isilon NAS appliances are still available and supported under the PowerScale brand.
N/A
IBM InfoSphere Information Server
Score 8.0 out of 10
N/A
IBM InfoSphere Information Server is a data integration platform used to understand, cleanse, monitor and transform data. The offerings provide massively parallel processing (MPP) capabilities.
I also use Microsoft Azure Machine Learning in parallel with Databricks. They use different file formats which teach me to be flexible and able to write different programs. They are equally useful to me and I would like to master both platforms for any future usage. I do prefer …
Medium to Large data throughput shops will benefit the most from Databricks Spark processing. Smaller use cases may find the barrier to entry a bit too high for casual use cases. Some of the overhead to kicking off a Spark compute job can actually lead to your workloads taking longer, but past a certain point the performance returns cannot be beat.
EMC Isilon Scale-Out NAS is well suited for larger files (greater the 128 Kb) and where you need to have everything in one common name space. Where it is less appropriate is for many small files (millions of files less than 128 Kb in size) - this causes the protection level to becoming mirroring, which will cost more space.
Information Server is extremely useful to replace manual developments that require a lot of coding effort. It significantly increases the productivity of the initial development and the future maintenance of the processes since it has a visual development environment with self-documentation.
Some upgrades require the entire cluster to be rebooted simultaneously. In this day and age, that should not be necessary. This is my biggest disappointment with Isilon to date.
When using multiple storage pools you have to be very careful with your capacity management. Filling up one pool can cause an overflow of data to a pool that is less performance driven. Do not underestimate your capacities or you will find yourself in a tight spot.
Block size is almost always an issue with Isilon. It does not handle all types of data well. In many cases PACS and VNA data is best to be stored on a different storage platform that will utilize the capacity more efficiently that Isilon is capable of.
Deduplication seems to be less efficient on Isilon than on other platforms for similar types of data.
Because it is an amazing platform for designing experiments and delivering a deep dive analysis that requires execution of highly complex queries, as well as it allows to share the information and insights across the company with their shared workspaces, while keeping it secured.
in terms of graph generation and interaction it could improve their UI and UX
One of the best customer and technology support that I have ever experienced in my career. You pay for what you get and you get the Rolls Royce. It reminds me of the customer support of SAS in the 2000s when the tools were reaching some limits and their engineer wanted to know more about what we were doing, long before "data science" was even a name. Databricks truly embraces the partnership with their customer and help them on any given challenge.
The most important differentiating factor for Databricks Lakehouse Platform from these other platforms is support for ACID transactions and the time travel feature. Also, native integration with managed MLflow is a plus. EMR, Cloudera, and Hortonworks are not as optimized when it comes to Spark Job Execution. Other platforms need to be self-managed, which is another huge hassle.
Raw disk space vs. logical disk space ratio was significantly better on the Isilon. Fast cache using SSD drives for faster searching is available on the Isilon but not available on the Overland solution. Isilon solution included faster backend switching between nodes.