We've been super happy with Astra DB. It's been extremely well-suited for our vector search needs as described in previous responses. With Astra DB’s high-performance vector search, Maester’s AI dynamically optimizes responses in real-time, adapting to new user interactions without requiring costly retraining cycles.
I find Qubole is well suited for getting started analyzing data in the cloud without being locked in to a specific cloud vendor's tooling other than the underlying filesystem. Since the data itself is not isolated to any Qubole cluster, it can be easily be collected back into a cloud-vendor's specific tools for further analysis, therefore I find it complementary to any offerings such as Amazon EMR or Google DataProc.
We need to be able to process a lot of data (our biggest clients process hundreds of milions of transactions every month). However, it is not only the amount of data, it is also an unpredictable patterns with spikes occuring at different points of time - something athat Astra is great at.
Our processing needs to be extremaly fast. Some of our clients use our enrichment in a synchronous way, meaning that any delay in processing is holding up the whole transaction lifecycle and can have a major impact on the client. Astra is very fast.
A close collaboration with GCP makes our life very easy. All of our technology sits in Google Cloud, so having Astra in there makes it a no-brainer solution for us.
The support team sometimes requires the escalate button pressed on tickets, to get timely responses. I will say, once the ticket is escalated, action is taken.
They require better documentation on the migration of data. The three primary methods for migrating large data volumes are bulk, Cassandra Data Migrator, and ZDM (Zero Downtime Migration Utility). Over time I have become very familiar will all three of these methods; however, through working with the Services team and the support team, it seemed like we were breaking new ground. I feel if the utilities were better documented and included some examples and/or use cases from large data migrations; this process would have been easier. One lesson learned is you likely need to migrate your application servers to the same cloud provider you host Astra on; otherwise, the latency is too large for latency-sensitive applications.
Providing an open selection of all cloud provider instance types with no explanation as to their ideal use cases causes too much confusion for new users setting up a new cluster. For example, not everyone knows that Amazon's R or X-series models are memory optimized, while the C and M-series are for general computation.
I would like to see more ETL tools provided other than DistCP that allow one to move data between Hadoop Filesystems.
From the cluster administration side, onboarding of new users for large companies seems troublesome, especially when trying to create individual cluster per team within the company. Having the ability to debug and share code/queries between users of other teams / clusters should also be possible.
Personally, I have no issues using Amazon EMR with Hue and Zeppelin, for example, for data science and exploratory analysis. The benefits to using Qubole are that it offers additional tooling that may not be available in other cloud providers without manual installation and also offers auto-terminating instances and scaling groups.
Their response time is fast, in case you do not contact them during business hours, they give a very good follow-up to your case. They also facilitate video calls if necessary for debugging.
Graph, search, analytics, administration, developer tooling, and monitoring are all incorporated into a single platform by Astra DB. Mongo Db is a self-managed infrastructure. Astra DB has Wide column store and Mongo DB has Document store. The best thing is that Astra DB operates on Java while Mongo DB operates on C++
Qubole was decided on by upper management rather than these competitive offerings. I find that Databricks has a better Spark offering compared to Qubole's Zeppelin notebooks.
We are well aware of the Cassandra architecture and familiar with the open source tooling that Datastax provides the industry (K8sSandra / Stargate) to scale Cassandra on Kubernetes.
Having prior knowledge of Cassandra / Kubernetes means we know that under the hood Astra is built on infinitely scalable technologies. We trust that the foundations that Astra is built on will scale so we know Astra will scale.
We like to say that Qubole has allowed for "data democratization", meaning that each team is responsible for their own set of tooling and use cases rather than being limited by versions established by products such as Hortonworks HDP or Cloudera CDH
One negative impact is that users have over-provisioned clusters without realizing it, and end up paying for it. When setting up a new cluster, there are too many choices to pick from, and data scientists may not understand the instance types or hardware specs for the datasets they need to operate on.