Databricks in San Francisco offers the Databricks Lakehouse Platform (formerly the Unified Analytics Platform), a data science platform and Apache Spark cluster manager. The Databricks Unified Data Service aims to provide a reliable and scalable platform for data pipelines, data lakes, and data platforms. Users can manage full data journey, to ingest, process, store, and expose data throughout an organization. Its Data Science Workspace is a collaborative environment for practitioners to run…
$0.07
Per DBU
Elasticsearch
Score 8.8 out of 10
N/A
Elasticsearch is an enterprise search tool from Elastic in Mountain View, California.
Medium to Large data throughput shops will benefit the most from Databricks Spark processing. Smaller use cases may find the barrier to entry a bit too high for casual use cases. Some of the overhead to kicking off a Spark compute job can actually lead to your workloads taking longer, but past a certain point the performance returns cannot be beat.
Elasticsearch is a really scalable solution that can fit a lot of needs, but the bigger and/or those needs become, the more understanding & infrastructure you will need for your instance to be running correctly. Elasticsearch is not problem-free - you can get yourself in a lot of trouble if you are not following good practices and/or if are not managing the cluster correctly. Licensing is a big decision point here as Elasticsearch is a middleware component - be sure to read the licensing agreement of the version you want to try before you commit to it. Same goes for long-term support - be sure to keep yourself in the know for this aspect you may end up stuck with an unpatched version for years.
As I mentioned before, Elasticsearch's flexible data model is unparalleled. You can nest fields as deeply as you want, have as many fields as you want, but whatever you want in those fields (as long as it stays the same type), and all of it will be searchable and you don't need to even declare a schema beforehand!
Elastic, the company behind Elasticsearch, is super strong financially and they have a great team of devs and product managers working on Elasticsearch. When I first started using ES 3 years ago, I was 90% impressed and knew it would be a good fit. 3 years later, I am 200% impressed and blown away by how far it has come and gotten even better. If there are features that are missing or you don't think it's fast enough right now, I bet it'll be suitable next year because the team behind it is so dang fast!
Elasticsearch is really, really stable. It takes a lot to bring down a cluster. It's self-balancing algorithms, leader-election system, self-healing properties are state of the art. We've never seen network failures or hard-drive corruption or CPU bugs bring down an ES cluster.
Connect my local code in Visual code to my Databricks Lakehouse Platform cluster so I can run the code on the cluster. The old databricks-connect approach has many bugs and is hard to set up. The new Databricks Lakehouse Platform extension on Visual Code, doesn't allow the developers to debug their code line by line (only we can run the code).
Maybe have a specific Databricks Lakehouse Platform IDE that can be used by Databricks Lakehouse Platform users to develop locally.
Visualization in MLFLOW experiment can be enhanced
Because it is an amazing platform for designing experiments and delivering a deep dive analysis that requires execution of highly complex queries, as well as it allows to share the information and insights across the company with their shared workspaces, while keeping it secured.
in terms of graph generation and interaction it could improve their UI and UX
To get started with Elasticsearch, you don't have to get very involved in configuring what really is an incredibly complex system under the hood. You simply install the package, run the service, and you're immediately able to begin using it. You don't need to learn any sort of query language to add data to Elasticsearch or perform some basic searching. If you're used to any sort of RESTful API, getting started with Elasticsearch is a breeze. If you've never interacted with a RESTful API directly, the journey may be a little more bumpy. Overall, though, it's incredibly simple to use for what it's doing under the covers.
One of the best customer and technology support that I have ever experienced in my career. You pay for what you get and you get the Rolls Royce. It reminds me of the customer support of SAS in the 2000s when the tools were reaching some limits and their engineer wanted to know more about what we were doing, long before "data science" was even a name. Databricks truly embraces the partnership with their customer and help them on any given challenge.
We've only used it as an opensource tooling. We did not purchase any additional support to roll out the elasticsearch software. When rolling out the application on our platform we've used the documentation which was available online. During our test phases we did not experience any bugs or issues so we did not rely on support at all.
The most important differentiating factor for Databricks Lakehouse Platform from these other platforms is support for ACID transactions and the time travel feature. Also, native integration with managed MLflow is a plus. EMR, Cloudera, and Hortonworks are not as optimized when it comes to Spark Job Execution. Other platforms need to be self-managed, which is another huge hassle.
As far as we are concerned, Elasticsearch is the gold standard and we have barely evaluated any alternatives. You could consider it an alternative to a relational or NoSQL database, so in cases where those suffice, you don't need Elasticsearch. But if you want powerful text-based search capabilities across large data sets, Elasticsearch is the way to go.
We have had great luck with implementing Elasticsearch for our search and analytics use cases.
While the operational burden is not minimal, operating a cluster of servers, using a custom query language, writing Elasticsearch-specific bulk insert code, the performance and the relative operational ease of Elasticsearch are unparalleled.
We've easily saved hundreds of thousands of dollars implementing Elasticsearch vs. RDBMS vs. other no-SQL solutions for our specific set of problems.