TrustRadius Insights for Apache Cassandra are summaries of user sentiment data from TrustRadius reviews and, when necessary, third party data sources.
Pros
Greatest community and adoption: The Java-based NoSQL database has garnered a strong following with its greatest community and adoption. Many users have found it to be a highly popular choice among developers, benefiting from the extensive support and resources available.
Excellent integration with Apache Hadoop, Apache Spark, and Solr: Reviewers have consistently praised the database for its excellent integration capabilities with Apache Hadoop, Apache Spark, and Solr. This seamless integration provides a robust ecosystem of tools that enable efficient unit tests and stress testing.
Best-in-class performance across various workloads: Users have consistently highlighted the exceptional performance of this database across various read/write/mixed workloads. Its ability to provide low latency and high throughput has been widely appreciated by customers who require fast data retrieval and processing.
Cassandra is currently used for our enterprise eCommerce platform. So far our experience is good with Cassandra its an extremely powerful NoSQL Database with high performance—distributed, scalable, and highly available database platform.
Pros
Continuous data availability is extremely powerful feature of Cassandra.
Overall cost effective and low maintenance database platform.
High performance and low tolerance no SQL database.
Cons
Moving data from and to Cassandra to any relational database platform can be improved.
Database event logging can be handled more efficiently.
Likelihood to Recommend
It's perfect for big data or high volume data to load log files, event files, and streaming or video/image data. It gives really high performance dealing with big data fetches. But when you need to make table joins or you need more of a relational data structure, I do not think Cassandra will fit for that.
It’s one of the database platforms we offer to the development community in our organization. We have various selections when it comes to databases including DB2, SQL Server, Oracle, and hadoop for data warehousing. Cassandra becomes the choice when developers want to use a highly available NoSQL db.
Pros
Availability
Fast performance
Horizontal scalability
Memory first
Partition based
Cons
Dealing with tombstone
Maintenance/upgrade
Compaction and repair
Likelihood to Recommend
We use it for collecting user preferences on our website which can be quickly reused. It's also well suited for document ID lookup systems. It’s not good for high consistency level of information like account balance in your banking system.
We use Cassandra as the NoSQL database for our use cases. We stream a lot of API data into this database and rely on the availability it gives us. It has proven to be consistent, which we use to our advantage. Cassandra can distribute data across multiple machines in an app-transparent manner, thus helping us to expand it on demand.
Pros
Cassandra is a masterless design, hence massively scalable. It is great for applications and use cases that cannot afford to lose data. There is no single point of failure.
You can add more nodes to Cassandra to linearly increase your transactions/requests. Also, it has great support across cloud regions and data centers.
Cassandra provides features like tunable consistency, data compression and CQL(Cassandra Query Language) which we use.
Cons
The underlying medium of Cassandra is a key-value store. So when you model your data, it is based on how you would want to query it and not how the data is structured. This results in a repetition of data when storing. Hence, there is no referential integrity - there is no concept of JOIN connections in Cassandra.
Data aggregation functions like SUM, MIN, MAX, AVG, and others are very costly even if possible. Hence Ad-hoc query or analysis is difficult.
Likelihood to Recommend
You should be very clear where you want to use Cassandra because there is no referential integrity (JOIN) in Cassandra. You have to model data based on how you want to query it, hence what use cases it can be used for should be considered carefully.
You can use it where you want to store log or user-behavior types of data. You can use it in heavy-write or time-series data storage. It is good in retail applications for fast product catalog inputs and lookups
Cassandra is a NoSQL database which is used to store a large amount of data quickly. It has a very fast write speed, allowing a large volume of data storage within a small amount of time. It is tunable and can be used to store data. It is more suitable for storing flat data rather than relational data.
Pros
Write speed. Cassandra is very fast while writing data due to its unique architecture.
Tunable consistency - During data replication, consistency can be tuned for a particular data set to be available during an outage.
CQL - cassandra query language is a subset of SQL and eases the transition from a more traditional database.
Cons
Aggregation functions are not very efficient.
Ad-hoc queries do not perform well. Queries which were visualized while designing the databases only perform well.
Performance is unpredictable.
Likelihood to Recommend
Cassandra is well suited to storing a large volume of data within a very small period of time. It is relatively fast and the data consistency can be tuned for datasets for custom availability during an outage. It can be interacted with using CQL-- Cassandra query language-- which is similar to SQL, and thus transition is easier. It however performs less during aggregation and querying.
Cassandra is used in my organization by my department to handle data that is not in a standard RDMBS format.
Pros
Runs on commodity hardware
Build in fault tolerance
Can grow horizontally
Cons
It is a bit difficult for people that come from the SQL world.
Managing anti-entropy repair is still a bit of a challenge.
Better security patches.
Likelihood to Recommend
Nothing beats software that works and charges nothing. It handles data that is not fit for traditional RDBMS. However, not a lot of employees know how to use it efficiently.
We use Cassandra to build a fully functional POC (with the continuous production level volume of feeding data) for a shipment cloud concept for Fedex's EMEA region. This solution is composed of two parts, we use an IMDG product to keep the latest transaction of all shipments' latest "status" while we use Cassandra as our long-term transaction storage to keep all historical shipment status update events. On top of those InMemory and NoSQL storage, we built one unified RESTful based service, which depends on the user's query needs, either/and/or query the IMDG for the latest status of the shipment or query the Cassandra for the history of the shipment. Also, the Cassandra is used as the "backup" of the IMDG, in case the IMDG part is fully crashed (the worst scenario). Thanks to the time series way of persisting the data in Cassandra, we still can extract the "latest" status of a shipment from Cassandra's full transaction history with reasonable performance (slower than IMDG but much quicker than the traditional relational database).
Pros
Cassandra is very strong for saving the time series based transaction data model, simply by reversing the time series order when creating the data table, we can very quickly fetch the "latest" records even from millions of associated transactions because the latest record is always at the top of the search. By combining with the TTL feature of the Cassandra column, it is easy to "auto" delete the old data.
Cassandra combines the key-value store from Amazon's DynamoDB with the column family data model from the Google's BigTable, which makes it easy to manage both structured and non-structured data model efficiently.
By using the DataStax Enterprise version provided Solr integration, it can even solve some ad-hoc query needs which may not be fully taken into account at the beginning of the project when the data table is created. This extremely adds more room to play for a large enterprise or project which does require some flexibility in the practical context.
The linear scalability provided by Cassandra, allowing us to easily scale up/down the cluster by simply adding/removing the servers.
The throughput for both the read/write performance of Cassandra is quite good.
Cons
Managing the big cluster of Cassandra , even with the DataStax Enterprise Version, is still quite challenging for a maintenance team, considering the frequent version upgrade (even in the rolling fashion) and more frequent auto-repair, for me on this area, a powerful tool should be provided to "automate" this process as much as possible.
The TTL design is good, however the pain is if the TTL is set on some data already inserted, it can not be simply updated. Unless that data is reinserted again, this fact causes a lot of issues in case the business strategy is changed which requires the purge strategy to be updated also.
As the nature of Cassandra is still Java based, the GC sometimes eats some performance, if Cassandra can allow using more non-Heap memory space, to reduce the GC efforts which will free more power on the hardware.
The default indexing strategy for JSON formatted data in the DataStax's Solr integration is not available. At this moment we have to implement our own to support our JSON text stored. We extract the key field from our data which might be required to be ad-hoc searched, converting them into the JSON format (only one level Map), and save them into the Cassandra column. On top of that we want Solr to index the key of each token.
Likelihood to Recommend
For the scenarios which need ACID support, maybe Cassandra is not the best, but for an insert only (time series based) transaction case and requirements to cope with the unpredictable data model/structure changes of the future, then Cassandra is one of the best options. If you only use the open source version of Cassandra, then without Solr integrated, you need to know your search query before you create the table, if that's not possible then Cassandra or other NoSQL DB might not your right choice.
We are using Cassandra based on the requirements and data availability to the application (based on queries for search).
Pros
Cassandra lot of API's ready available for map reducing queries (like materialized queries).
Cassandra uses ring architecture approach, there is no master-slave approach (like HBase). If data is published on the node, the data will get synced with other nodes in the ring architecture, compared to HBase which has a dedicated master node to orchestrate the data into its slaves.
Write Speed
Multi Data Center Replication
Tunable Consistency
Integrates with JVM because it's written in Java
Cassandra Query Language is a subset of SQL query (less learning curve)
Cons
No Ad-Hoc Queries: Cassandra data storage layer is basically a key-value storage system. This means that you must "model" your data around the queries you want to surface, rather than around the structure of the data itself.
There are no aggregations queries available in Cassandra.
Not fit for transactional data.
Likelihood to Recommend
Cassandra data storage layer is basically a key-value storage system. This means that you must model your data around the queries you want to surface, rather than around the structure of the data itself. This can lead to storing the data multiple times in different ways to be able to satisfy the requirements of your application.
Cassandra is an open-source NoSQL database solution offered by Apache. What's nice about Cassandra is its ability to host the data in multiple nodes in a ring, and changes made to a node in the ring will shard the update to the rest. For geographically dispersed architecture requiring local database storage, this can be a valuable asset which makes this NoSQL option stand above the rest.
Pros
Cassandra can preform read/writes very quick
Nodes in a ring will keep up to date by sharding information to each other
Cassandra is well suited for scalable application needing keyspace storage
Cons
Cassandra's query language is clunky, which is likely due to the nature of NoSQL.
Lacking the ability to relate data between sets makes querying harder, but this again is the nature of NoSQL.
Likelihood to Recommend
<div>Cassandra is suited for applications that need quick read and write abilities. The key to column family relationship allows for super quick lookup and inserts. The nature of the ring cluster allows for fault tolerance, as well as geo-redundant storage. Cassandra is not well suited when needing to use the data to make relational inferences.</div><div>
Used for specific product (which is used by whole organization). Addressing for column store we need for uniqueness of proprietary information that Redis and Mongo does not support.
Pros
Masterless
Schema-less
Multiple datacenter usage w/ little or no data loss
Cons
Rebuild/repair of objects (tables) in the keyspaces, allow to ignore keyspaces to repair.
Monitoring tool form opscenter support for Cassandra 3.x (or some other open source tool)
UI browser type to view data (rather than csql)
Likelihood to Recommend
[Cassandra is well suited to] schema-less dataset for large key value stores.
We wanted to use Cassandra to load millions of metrics we collect daily from our user base. After we collected the data we also needed to perform calculations and run "sql" like queries. The only database that came to mind, and does all those things well, is Cassandra.
Pros
Automatic data sharding between nodes
High availability
Python Support drivers
Cons
Managing cassandra nodes (adding, removing)
Need a separate tool to have a console (datastax opscenter)
Likelihood to Recommend
Cassandra performed very well when we were writing a ~300 GB of data per day on a 3 node cluster. If we had decided to read instead we found minor performance issues. When reading the data we expected as much. But for applications that are very read heavy we would chose a different product such as Couchbase.