TrustRadius
Cassandra is a no-SQL database from Apache.https://media.trustradius.com/product-logos/3L/E2/KTUG69Z79F4L.pngOne of the Best NoSQL Databases!Cassandra is currently used for our enterprise eCommerce platform. So far our experience is good with Cassandra its an extremely powerful NoSQL Database with high performance—distributed, scalable, and highly available database platform.,Continuous data availability is extremely powerful feature of Cassandra. Overall cost effective and low maintenance database platform. High performance and low tolerance no SQL database.,Moving data from and to Cassandra to any relational database platform can be improved. Database event logging can be handled more efficiently.,8,Cost effective Lower administration effort Continuous data availability,MongoDB,SQL Server Integration Services, Teradata Database, ZappySys,9Cassandra at scaleIt’s one of the database platforms we offer to the development community in our organization. We have various selections when it comes to databases including DB2, SQL Server, Oracle, and hadoop for data warehousing. Cassandra becomes the choice when developers want to use a highly available NoSQL db.,Availability Fast performance Horizontal scalability Memory first Partition based,Dealing with tombstone Maintenance/upgrade Compaction and repair,8,High availability, which makes us use the tool Partition key distribution mechanism is well defined Horizontal scalability when needed for high volume days Complex tombstone management,,Couchbase Data Platform, Azure SQL Database,8,100,100,Document id lookup system Tracking if internal application usage Session management,Fraud Risk system that determines any users might be potential victims Protecting internal uuid exposing to all other systems Document ID lookup system,Url smart links Credit reports tracking system Asset management,Yes,Product Usability Product Reputation Prior Experience with the Product,Include more pilot groups ie developers to try out different features and do more stress testing.,Implemented in-house Third-party professional services,Yes,Too many nodes to handle Upgrade can be difficult when you have too many clusters with too quick changes on versioning,7,No,7,Yes,We have license with Datastax so have not dealt with apache.,Opscenter Bouncing servers Backup / restore,Tombstones Repair process Compaction,7Pretty good softwareCassandra is used in my organization by my department to handle data that is not in a standard RDMBS format.,Runs on commodity hardware Build in fault tolerance Can grow horizontally,It is a bit difficult for people that come from the SQL world. Managing anti-entropy repair is still a bit of a challenge. Better security patches.,8,Nothing beats free People need to be retrained Support model is a bit different,7Cassandra - a tunable NoSQL datastoreCassandra is a NoSQL database which is used to store a large amount of data quickly. It has a very fast write speed, allowing a large volume of data storage within a small amount of time. It is tunable and can be used to store data. It is more suitable for storing flat data rather than relational data.,Write speed. Cassandra is very fast while writing data due to its unique architecture. Tunable consistency - During data replication, consistency can be tuned for a particular data set to be available during an outage. CQL - cassandra query language is a subset of SQL and eases the transition from a more traditional database.,Aggregation functions are not very efficient. Ad-hoc queries do not perform well. Queries which were visualized while designing the databases only perform well. Performance is unpredictable.,9,Low learning curve Scalable with high performance highly fault tolerant during outage,,Apache Kafka,9Cassandra: A highly available and scalable databaseWe use Cassandra as the NoSQL database for our use cases. We stream a lot of API data into this database and rely on the availability it gives us. It has proven to be consistent, which we use to our advantage. Cassandra can distribute data across multiple machines in an app-transparent manner, thus helping us to expand it on demand.,Cassandra is a masterless design, hence massively scalable. It is great for applications and use cases that cannot afford to lose data. There is no single point of failure. You can add more nodes to Cassandra to linearly increase your transactions/requests. Also, it has great support across cloud regions and data centers. Cassandra provides features like tunable consistency, data compression and CQL(Cassandra Query Language) which we use.,The underlying medium of Cassandra is a key-value store. So when you model your data, it is based on how you would want to query it and not how the data is structured. This results in a repetition of data when storing. Hence, there is no referential integrity - there is no concept of JOIN connections in Cassandra. Data aggregation functions like SUM, MIN, MAX, AVG, and others are very costly even if possible. Hence Ad-hoc query or analysis is difficult.,8,Highly available and scalable database. Hence highly reliable in the organization for data. Good performance - low latency and great throughput with varying workloads. Maintenance and monitoring need investment and do not work great out of the box.,Amazon DynamoDB and MongoDB,PostgreSQL, Amazon Aurora, Amazon DynamoDB,9Cassandra, put into the real business contextWe use Cassandra to build a fully functional POC (with the continuous production level volume of feeding data) for a shipment cloud concept for Fedex's EMEA region. This solution is composed of two parts, we use an IMDG product to keep the latest transaction of all shipments' latest "status" while we use Cassandra as our long-term transaction storage to keep all historical shipment status update events. On top of those InMemory and NoSQL storage, we built one unified RESTful based service, which depends on the user's query needs, either/and/or query the IMDG for the latest status of the shipment or query the Cassandra for the history of the shipment. Also, the Cassandra is used as the "backup" of the IMDG, in case the IMDG part is fully crashed (the worst scenario). Thanks to the time series way of persisting the data in Cassandra, we still can extract the "latest" status of a shipment from Cassandra's full transaction history with reasonable performance (slower than IMDG but much quicker than the traditional relational database).,Cassandra is very strong for saving the time series based transaction data model, simply by reversing the time series order when creating the data table, we can very quickly fetch the "latest" records even from millions of associated transactions because the latest record is always at the top of the search. By combining with the TTL feature of the Cassandra column, it is easy to "auto" delete the old data. Cassandra combines the key-value store from Amazon's DynamoDB with the column family data model from the Google's BigTable, which makes it easy to manage both structured and non-structured data model efficiently. By using the DataStax Enterprise version provided Solr integration, it can even solve some ad-hoc query needs which may not be fully taken into account at the beginning of the project when the data table is created. This extremely adds more room to play for a large enterprise or project which does require some flexibility in the practical context. The linear scalability provided by Cassandra, allowing us to easily scale up/down the cluster by simply adding/removing the servers. The throughput for both the read/write performance of Cassandra is quite good.,Managing the big cluster of Cassandra , even with the DataStax Enterprise Version, is still quite challenging for a maintenance team, considering the frequent version upgrade (even in the rolling fashion) and more frequent auto-repair, for me on this area, a powerful tool should be provided to "automate" this process as much as possible. The TTL design is good, however the pain is if the TTL is set on some data already inserted, it can not be simply updated. Unless that data is reinserted again, this fact causes a lot of issues in case the business strategy is changed which requires the purge strategy to be updated also. As the nature of Cassandra is still Java based, the GC sometimes eats some performance, if Cassandra can allow using more non-Heap memory space, to reduce the GC efforts which will free more power on the hardware. The default indexing strategy for JSON formatted data in the DataStax's Solr integration is not available. At this moment we have to implement our own to support our JSON text stored. We extract the key field from our data which might be required to be ad-hoc searched, converting them into the JSON format (only one level Map), and save them into the Cassandra column. On top of that we want Solr to index the key of each token.,Peer-to-Peer concept avoids the traditional Master/Slave mode's single failure pointDistributed/Partitioned data architecture and near real-time replica, ensuring the high availabilityCross Continent Cluster level replication over the data centers is key for enterprise level use case The configurable data consistency level allows us to balance the performance needs and the availability based on various use cases,8,The open source version of Cassandra is only suggested for learning the basic concepts and play with its core features. Unless you really want to invest a lot in your developers and architects knowing every detail of Cassandra, I prefer the DataStax enterprise version. Although the license cost is relatively high, I think they it is worth it. I'm thinking about the support, the monitoring tool OpsCenter, and the integration of Solr and Spark (for data analysis). Cassandra didn't fully replace our old and traditional relation database Oracle. In addition, it opens another door for us to deal with some special business use cases that NoSQL database can do better in a more feasible and efficient way.,MongoDB and HBase,8Cassandra Usage and NeedsWe are using Cassandra based on the requirements and data availability to the application (based on queries for search).,Cassandra lot of API's ready available for map reducing queries (like materialized queries). Cassandra uses ring architecture approach, there is no master-slave approach (like HBase). If data is published on the node, the data will get synced with other nodes in the ring architecture, compared to HBase which has a dedicated master node to orchestrate the data into its slaves. Write Speed Multi Data Center Replication Tunable Consistency Integrates with JVM because it's written in Java Cassandra Query Language is a subset of SQL query (less learning curve),No Ad-Hoc Queries: Cassandra data storage layer is basically a key-value storage system. This means that you must "model" your data around the queries you want to surface, rather than around the structure of the data itself. There are no aggregations queries available in Cassandra. Not fit for transactional data.,8,,MongoDB, Elasticsearch, MySQL,8review of cassandraUsed for specific product (which is used by whole organization). Addressing for column store we need for uniqueness of proprietary information that Redis and Mongo does not support.,Masterless Schema-less Multiple datacenter usage w/ little or no data loss,Rebuild/repair of objects (tables) in the keyspaces, allow to ignore keyspaces to repair. Monitoring tool form opscenter support for Cassandra 3.x (or some other open source tool) UI browser type to view data (rather than csql),Nodetool would be good to apply in csql as wel, as well as ability to make modification in configs in realtime (cassandra.yaml, cassandra-env) w/o having to restart cassandra node,7,HBASE and MongoDB,Aerospike Database, MongoDB, Redis, Couchbase Server,6Cassandra, a highly scalable NoSQL DBWe wanted to use Cassandra to load millions of metrics we collect daily from our user base. After we collected the data we also needed to perform calculations and run "sql" like queries. The only database that came to mind, and does all those things well, is Cassandra.,Automatic data sharding between nodes High availability Python Support drivers,Managing cassandra nodes (adding, removing) Need a separate tool to have a console (datastax opscenter),The most effective feature we found and used on a day to day basis was cqlengine. After loading all the metrics we need a quick way to index and query the data for millions of metrics. After doing some data analysis on the fly we then presented that to users on a GUI. Using Cassandra we were able to do this in near real time.,8,We were able to consolidate costs onto a 3 node cassandra cluster from Redshift.,Couchbase Server,Couchbase Server,8Apache Cassandra - Why Would You Look Elsewhere?Apache Cassandra is used extensively across the whole of our organization. It is used for various critical use cases and platform solutions where we are creating highly available, linearly scalable systems with tunable consistency. We have used it actively and rigorously for products within tax domain, small businesses, profile platforms, AB testing platforms and it is being used across product groups with great success!,As a Java based NoSQL database it has the greatest community and adoption. Coupled with great Apache hadoop, Apache Spark and Solr integration and a strong tools ecosystem(unit tests, stress testing), it is a unbeatable combination! As a hybrid architecture based on masterless architecture as in DynamoDB and column family data model as in BigTable, it hits the bulls eye! It has best in class performance across different kinds of read/write/mixed workloads. It provides linear scalability which works for the best performance, lowest latency and highest throughput. Being a tunable consistency model enables you to have consistency as your platform/application needs. If configured correctly, there is no downtime and no data loss.These are key criterias on critical domains.,Apache Cassandra is lacking in some features, which Datastax provides in the Enterprise version. For example, security and advanced tools like OpsCenter. These would be a great addition to open source Apache Cassandra. At times we noticed some versions had issues not known in advance, for example, LostNotificationError on repair of nodes. However steadily the newer releases have become better and more stable. Examples of datastax native driver with Cassandra 2.1 can be improved, as it does not provide all scenarios one would need on production. If you prefer to work with an open source project and be hands on, Apache Cassandra is one of the best. However if you need a managed cassandra like service where you do not even want to configure/deploy/backup/restack, a DynamoDB service would be more preferred. Cassandra is JVM based NoSQL, hence garbage collector tuning is a key aspect, Garbage collection in JDK 8 and G1GC garbage collector is better or configure ConcurrentMarkSweep(CMS) garbage collector in an optimum manner.,Apache Cassandra has been effectively used at our organization across multiple critical functionalities. Apache Cassandra is a Java based NoSQL, linearly scalable, best in class tunable performance, fault tolerant, distributed, masterless, time series database and has easy-to-use administration and monitoring functionality with nodetool tool. The documentation is exhaustive, and the community is agile and supportive. For all these reasons, Cassandra has become a NoSQL technology of choice for many platforms.,8,Highly Available Services, and Platforms. High Performance, Low Latency and Highest throughput across varying workloads. Configured, Tuned and Monitored correctly works to provide the best user experience!,MongoDB and HBase,8Cassandra, hands-on review, after 4 years of serious useCassandra is the only database used by Algorithmic Ads. We use it for both real-time transactions and analytics. The primary application accessing Cassandra is a light-weight Java application that provides a RESTful web services API for all our other applications. The API is a focal point for integration and includes both business logic and data. The same API is used both internally and by our customers. We rely on Cassandra for its amazing performance, linear scalability, and continuous availability.,Continuous availability: as a fully distributed database (no master nodes), we can update nodes with rolling restarts and accommodate minor outages without impacting our customer services. Linear scalability: for every unit of compute that you add, you get an equivalent unit of capacity. The same application can scale from a single developer's laptop to a web-scale service with billions of rows in a table. Amazing performance: if you design your data model correctly, bearing in mind the queries you need to answer, you can get answers in milliseconds. Time-series data: Cassandra excels at recording, processing, and retrieving time-series data. It's a simple matter to version everything and simply record what happens, rather than going back and editing things. Then, you can compute things from the recorded history.,Cassandra is a poor choice for implementing application queues. NoSQL requires thinking differently, and can be challenging for people with strong relational database backgrounds to understand. The CQL language helps with this, but it pays to understand how the engine works under the hood. That said, the benefits outweigh the challenge of the learning curve! Database compactions and anti-entropy repair can be burdensome on a busy cluster. Significant improvements have been made in recent versions, but it remains as an operational challenge.,CQL language: makes relational database developers feel at home, and helps them to adjust quickly.Column-oriented key-value store: makes it easy to manage both structured and unstructured data.Tunable consistency: makes it easy to balance performance, availability, and data consistency.Peer-to-peer: enables continuous availability and performance, with no master nodes, and no locking.Distributed data: multiple real-time replicas, with geographic separation, supports continuous availability.,10,Open source Apache Cassandra is free, the infrastructure to run it is cheap, and the expertise to use it is not. You'll be investing in your developers and devops team members, and they're worth it! Cassandra is incredibly cost-effective and it positions your applications to grow to web-scale. DataStax Enterprise merits serious consideration. There are licensing fees, but it's worth it for (1) production support (especially if your own team is new to Cassandra), (2) stable releases, (3) sophisticated operational tools like OpsCenter, (4) integration with Apache Solr for geospatial, faceted, full-text search, and (5) integration with Apache Spark for machine learning and streaming analytics.,Amazon DynamoDB, HBase, MongoDB, PostgreSQL, Riak and VoltDB,10Cassandra Rocks !!!We used Cassandra to store personalization data of our customers so that we can have this information available through the cluster. The primary advantage of Cassandra is the cluster configuration so that there is not a single point of failure. The writes are faster when you want to write data into the storage. We used it for storing data in JSON format which is used to store anything in JSON format. The data was always up to date and there was less latency when we read from the system. I would highly recommend using Cassandra so as to make a system more scalable and process requests faster.,Cassandra is highly scalable. It provides the flexibility to store data in any format. You can add column family dynamically as need by the application. One of the best noSQL solutions I've used so far.,A better UI access for reading the data. More graphical information to understand how the data is being processed, system uptime/downtime, etc. I used Cassandra-cli for running quries but it is not very helpful when it returns a lot of results. If there was some way to improve the user queries, it would be great.,I like the way we [can] assign tokens to the cluster so that it knows which node can process faster and if all of them need to be balanced equally. I like one thing about Cassandra and in general noSQL - that there is no single point of failure. Even if it goes down for some reason, it has logs from which it will process the pending transactions.,9,It had a great positive impact on developing a scalable application. The time for initial setup is more but once it's in place its very easy to use.,,9Cassandra as a building block for a distributed object storage systemCassandra is used as a component of our HyperStore S3-compatible object storage system. Cassandra is installed on each node and provides the distributed system logic to determine how to store objects. The other components are primarily Java servers that we wrote that work in conjunction with Cassandra to provide a scalable, peer-to-peer, highly fault tolerant system.,Performant. In particular, write performance is very good. Recently, a lot of work to address the changing systems environment has been done to take advantage of areas like SSDs and very dense storage systems. Distributed system logic. Multiple data centers and other common network configurations like heterogeneous nodes are handled and exploited well. Community. Strong community with users and project contributors worldwide. The open-source and commercial software people work well together with sharing of lessons learned and improvements based on feedback.,Operational tools. Would like to see continued work to improve the operational capability for large clusters and large amounts of data. For example, analyzing the on-disk files. Repair. Being able to run repair continuously and with greater control to avoid any spikes in resource use.,Some features are not used and would be good to have packaging that separates those for greater modularity. Items like counters, hadoop integration, come to mind.,9,Being able to build on top of the Cassandra product is a huge advantage as compared to having to develop that functionality ourselves. Not only the initial start, but we can take advantage of continued improvements.,,9Cassandra as NoSQL fault tolerant database choiceCassandra is an open-source NoSQL database solution offered by Apache. What's nice about Cassandra is its ability to host the data in multiple nodes in a ring, and changes made to a node in the ring will shard the update to the rest. For geographically dispersed architecture requiring local database storage, this can be a valuable asset which makes this NoSQL option stand above the rest.,Cassandra can preform read/writes very quick Nodes in a ring will keep up to date by sharding information to each other Cassandra is well suited for scalable application needing keyspace storage,Cassandra's query language is clunky, which is likely due to the nature of NoSQL. Lacking the ability to relate data between sets makes querying harder, but this again is the nature of NoSQL.,Cassandra is the choice for quick read and writes. If you need to store data in a geo redundant and transmutable way, Cassandra may be a good option. It also offers high availability and fault tolerance, both nice features. In terms of making relational inferences from the data, obviously the nature of the datastore prevents this. But something to consider when deciding which database storage is right for your app.,9,Cassandra has had a positive effect on our ROI by improving uptime and performance,MySQL and PostgreSQL,MySQL, PostgreSQL,9What makes Cassandra different!!!!I had used Cassandra in my academic projects which were related to cloud computing. I used it for a few projects on Salesforce where multi tenancy features are implemented. In such scenarios Cassandra was one the best choices for NoSql. Although we have used RDMS, the performance while using Cassandra was better. I have simulated a few real time running apps like Facebook and Uber where I have used RDMS and Cassandra, and checked the performance using Jmeter. It clearly shows that Cassandra boosts the performance over RDMS. One thing I find difficult in Cassandra is following the documents, which are not so understandable.,Undoubtedly performance is an important reason We have not encountered a single point of failure Scalability of Cassandra is good which is the most important for the companies where demand is scaling day by day.,Cassandra has a wide range of asynchronous jobs and background tasks that are not scheduled by the client, the execution can be eccentric. Because Cassandra is a key-value store, doing things like SUM, MIN, MAX, AVG and other aggregations are incredibly resource intensive if even possible to accomplish. I think querying options for retrieving data is very limited.,Cassandra is much lighter on memory requirements, especially if you don’t need to keep a lot of data in cache. It has much more advanced support for replication. The server can be set to use a specific consistency level to ensure that queries are replicated locally, or to remote data locations. This means you can let Cassandra handle redundancy across nodes, where it is aware of which rack and data center those nodes are on.,6,I have no experience with this but from the blogs and news what I believe is that in businesses where there is high demand for scalability, Cassandra is a good choice to go for. Since it works on CQL, it is quite familiar with SQL in understanding therefore it does not prevent a new employee to start in learning and having the Cassandra experience at an industrial level.,,MySQL, Redis, MongoDB, Visual Studio Test Professional,7,175Cassandra - pretty good if you know what you are doingCassandra is being used as a time series store for sensor data and is used by several researchers within our department. It serves as the storage layer in our home grown sensor analytics platforms that utilizes spark for the computation. We use it to store billions of samples of wearable sensor data that is collected in various studies and experiments.,High Availability - we utilize the data replication features of Cassandra. This enables us to access our data even when several nodes have gone down Data Locality - our architecture combines Cassandra storage nodes and computation nodes in the same machine. This enables us to utilize data locality and limit expensive network IO to read data. Elasticity - Cassandra is a shared nothing architecture. Nodes can be added very easily and they discover the network topology. As soon as a node has joined the Cassandra ring, the data is redistributed among the existing nodes and streamed to it automatically.,Cassandra runs on the JVM and therefor may require a lot of GC tuning for read/write intensive applications. Requires manual periodic maintenance - for example it is recommended to run a cleanup on a regular basis. There are a lot of knobs and buttons to configure the system. For many cases the default configuration will be sufficient, but if its not - you will need significant ramp up on the inner workings of Cassandra in order to effectively tune it.,Compaction may take a significant amount of time, and at times it will not complete. compaction requires resources, so cluster performance will be degraded during that time. Cassandra CQL does not support many SQL features. It is limited due to the architecture of the system.,8,This question is not relevant, as I work in a non-profit educational institution.,,8
Unspecified
Cassandra
67 Ratings
Score 8.2 out of 101
<a href='https://www.trustradius.com/static/about-trustradius-scoring' target='_blank' rel='nofollow'>trScore algorithm: Learn more.</a>TRScore

Cassandra Reviews

Cassandra
67 Ratings
<a href='https://www.trustradius.com/static/about-trustradius-scoring' target='_blank' rel='nofollow'>trScore algorithm: Learn more.</a>
Score 8.2 out of 101
Show Filters 
Hide Filters 
Filter 67 vetted Cassandra reviews and ratings
Clear all filters
Overall Rating
Reviewer's Company Size
Last Updated
By Topic
Industry
Department
Experience
Job Type
Role

Reviews (1-16 of 16)

  Vendors can't alter or remove reviews. Here's why.
Priti Asai / Thakkar profile photo
June 30, 2019

Cassandra Review: "One of the Best NoSQL Databases!"

Score 8 out of 10
Vetted Review
Verified User
Review Source
Cassandra is currently used for our enterprise eCommerce platform. So far our experience is good with Cassandra its an extremely powerful NoSQL Database with high performance—distributed, scalable, and highly available database platform.
  • Continuous data availability is extremely powerful feature of Cassandra.
  • Overall cost effective and low maintenance database platform.
  • High performance and low tolerance no SQL database.
  • Moving data from and to Cassandra to any relational database platform can be improved.
  • Database event logging can be handled more efficiently.
It's perfect for big data or high volume data to load log files, event files, and streaming or video/image data. It gives really high performance dealing with big data fetches. But when you need to make table joins or you need more of a relational data structure, I do not think Cassandra will fit for that.
Read Priti Asai / Thakkar's full review
Glen Kim profile photo
March 16, 2019

User Review: "Cassandra at scale"

Score 8 out of 10
Vetted Review
Verified User
Review Source
It’s one of the database platforms we offer to the development community in our organization. We have various selections when it comes to databases including DB2, SQL Server, Oracle, and hadoop for data warehousing. Cassandra becomes the choice when developers want to use a highly available NoSQL db.
  • Availability
  • Fast performance
  • Horizontal scalability
  • Memory first
  • Partition based
  • Dealing with tombstone
  • Maintenance/upgrade
  • Compaction and repair
We use it for collecting user preferences on our website which can be quickly reused. It's also well suited for document ID lookup systems. It’s not good for high consistency level of information like account balance in your banking system.
Read Glen Kim's full review
Feng Cai profile photo
February 26, 2019

Cassandra Review: "Pretty good software"

Score 8 out of 10
Vetted Review
Verified User
Review Source
Cassandra is used in my organization by my department to handle data that is not in a standard RDMBS format.
  • Runs on commodity hardware
  • Build in fault tolerance
  • Can grow horizontally
  • It is a bit difficult for people that come from the SQL world.
  • Managing anti-entropy repair is still a bit of a challenge.
  • Better security patches.
Nothing beats software that works and charges nothing. It handles data that is not fit for traditional RDBMS. However, not a lot of employees know how to use it efficiently.
Read Feng Cai's full review
Dhruba Jyoti Nag profile photo
March 06, 2019

User Review: "Cassandra - a tunable NoSQL datastore"

Score 9 out of 10
Vetted Review
Verified User
Review Source
Cassandra is a NoSQL database which is used to store a large amount of data quickly. It has a very fast write speed, allowing a large volume of data storage within a small amount of time. It is tunable and can be used to store data. It is more suitable for storing flat data rather than relational data.
  • Write speed. Cassandra is very fast while writing data due to its unique architecture.
  • Tunable consistency - During data replication, consistency can be tuned for a particular data set to be available during an outage.
  • CQL - cassandra query language is a subset of SQL and eases the transition from a more traditional database.
  • Aggregation functions are not very efficient.
  • Ad-hoc queries do not perform well. Queries which were visualized while designing the databases only perform well.
  • Performance is unpredictable.
Cassandra is well suited to storing a large volume of data within a very small period of time. It is relatively fast and the data consistency can be tuned for datasets for custom availability during an outage. It can be interacted with using CQL-- Cassandra query language-- which is similar to SQL, and thus transition is easier. It however performs less during aggregation and querying.
Read Dhruba Jyoti Nag's full review
No photo available
March 15, 2019

Review: "Cassandra: A highly available and scalable database"

Score 8 out of 10
Vetted Review
Verified User
Review Source
We use Cassandra as the NoSQL database for our use cases. We stream a lot of API data into this database and rely on the availability it gives us. It has proven to be consistent, which we use to our advantage. Cassandra can distribute data across multiple machines in an app-transparent manner, thus helping us to expand it on demand.
  • Cassandra is a masterless design, hence massively scalable. It is great for applications and use cases that cannot afford to lose data. There is no single point of failure.
  • You can add more nodes to Cassandra to linearly increase your transactions/requests. Also, it has great support across cloud regions and data centers.
  • Cassandra provides features like tunable consistency, data compression and CQL(Cassandra Query Language) which we use.
  • The underlying medium of Cassandra is a key-value store. So when you model your data, it is based on how you would want to query it and not how the data is structured. This results in a repetition of data when storing. Hence, there is no referential integrity - there is no concept of JOIN connections in Cassandra.
  • Data aggregation functions like SUM, MIN, MAX, AVG, and others are very costly even if possible. Hence Ad-hoc query or analysis is difficult.
You should be very clear where you want to use Cassandra because there is no referential integrity (JOIN) in Cassandra. You have to model data based on how you want to query it, hence what use cases it can be used for should be considered carefully.

You can use it where you want to store log or user-behavior types of data. You can use it in heavy-write or time-series data storage. It is good in retail applications for fast product catalog inputs and lookups
Read this authenticated review
yixiang Shan profile photo
October 30, 2017

Review: "Cassandra, put into the real business context"

Score 8 out of 10
Vetted Review
Verified User
Review Source
We use Cassandra to build a fully functional POC (with the continuous production level volume of feeding data) for a shipment cloud concept for Fedex's EMEA region. This solution is composed of two parts, we use an IMDG product to keep the latest transaction of all shipments' latest "status" while we use Cassandra as our long-term transaction storage to keep all historical shipment status update events. On top of those InMemory and NoSQL storage, we built one unified RESTful based service, which depends on the user's query needs, either/and/or query the IMDG for the latest status of the shipment or query the Cassandra for the history of the shipment. Also, the Cassandra is used as the "backup" of the IMDG, in case the IMDG part is fully crashed (the worst scenario). Thanks to the time series way of persisting the data in Cassandra, we still can extract the "latest" status of a shipment from Cassandra's full transaction history with reasonable performance (slower than IMDG but much quicker than the traditional relational database).
  • Cassandra is very strong for saving the time series based transaction data model, simply by reversing the time series order when creating the data table, we can very quickly fetch the "latest" records even from millions of associated transactions because the latest record is always at the top of the search. By combining with the TTL feature of the Cassandra column, it is easy to "auto" delete the old data.
  • Cassandra combines the key-value store from Amazon's DynamoDB with the column family data model from the Google's BigTable, which makes it easy to manage both structured and non-structured data model efficiently.
  • By using the DataStax Enterprise version provided Solr integration, it can even solve some ad-hoc query needs which may not be fully taken into account at the beginning of the project when the data table is created. This extremely adds more room to play for a large enterprise or project which does require some flexibility in the practical context.
  • The linear scalability provided by Cassandra, allowing us to easily scale up/down the cluster by simply adding/removing the servers.
  • The throughput for both the read/write performance of Cassandra is quite good.
  • Managing the big cluster of Cassandra , even with the DataStax Enterprise Version, is still quite challenging for a maintenance team, considering the frequent version upgrade (even in the rolling fashion) and more frequent auto-repair, for me on this area, a powerful tool should be provided to "automate" this process as much as possible.
  • The TTL design is good, however the pain is if the TTL is set on some data already inserted, it can not be simply updated. Unless that data is reinserted again, this fact causes a lot of issues in case the business strategy is changed which requires the purge strategy to be updated also.
  • As the nature of Cassandra is still Java based, the GC sometimes eats some performance, if Cassandra can allow using more non-Heap memory space, to reduce the GC efforts which will free more power on the hardware.
  • The default indexing strategy for JSON formatted data in the DataStax's Solr integration is not available. At this moment we have to implement our own to support our JSON text stored. We extract the key field from our data which might be required to be ad-hoc searched, converting them into the JSON format (only one level Map), and save them into the Cassandra column. On top of that we want Solr to index the key of each token.
For the scenarios which need ACID support, maybe Cassandra is not the best, but for an insert only (time series based) transaction case and requirements to cope with the unpredictable data model/structure changes of the future, then Cassandra is one of the best options. If you only use the open source version of Cassandra, then without Solr integrated, you need to know your search query before you create the table, if that's not possible then Cassandra or other NoSQL DB might not your right choice.
Read yixiang Shan's full review
Ravi Reddy profile photo
September 27, 2017

User Review: "Cassandra Usage and Needs"

Score 8 out of 10
Vetted Review
Verified User
Review Source
We are using Cassandra based on the requirements and data availability to the application (based on queries for search).
  • Cassandra lot of API's ready available for map reducing queries (like materialized queries).
  • Cassandra uses ring architecture approach, there is no master-slave approach (like HBase). If data is published on the node, the data will get synced with other nodes in the ring architecture, compared to HBase which has a dedicated master node to orchestrate the data into its slaves.
  • Write Speed
  • Multi Data Center Replication
  • Tunable Consistency
  • Integrates with JVM because it's written in Java
  • Cassandra Query Language is a subset of SQL query (less learning curve)
  • No Ad-Hoc Queries: Cassandra data storage layer is basically a key-value storage system. This means that you must "model" your data around the queries you want to surface, rather than around the structure of the data itself.
  • There are no aggregations queries available in Cassandra.
  • Not fit for transactional data.
Cassandra data storage layer is basically a key-value storage system. This means that you must model your data around the queries you want to surface, rather than around the structure of the data itself. This can lead to storing the data multiple times in different ways to be able to satisfy the requirements of your application.
Read Ravi Reddy's full review
Anson Abraham profile photo
March 16, 2017

"review of cassandra"

Score 7 out of 10
Vetted Review
Verified User
Review Source
Used for specific product (which is used by whole organization). Addressing for column store we need for uniqueness of proprietary information that Redis and Mongo does not support.
  • Masterless
  • Schema-less
  • Multiple datacenter usage w/ little or no data loss
  • Rebuild/repair of objects (tables) in the keyspaces, allow to ignore keyspaces to repair.
  • Monitoring tool form opscenter support for Cassandra 3.x (or some other open source tool)
  • UI browser type to view data (rather than csql)
[Cassandra is well suited to] schema-less dataset for large key value stores.
Read Anson Abraham's full review
Abdel Kamel profile photo
July 13, 2016

User Review: "Cassandra, a highly scalable NoSQL DB"

Score 8 out of 10
Vetted Review
Verified User
Review Source
We wanted to use Cassandra to load millions of metrics we collect daily from our user base. After we collected the data we also needed to perform calculations and run "sql" like queries. The only database that came to mind, and does all those things well, is Cassandra.
  • Automatic data sharding between nodes
  • High availability
  • Python Support drivers
  • Managing cassandra nodes (adding, removing)
  • Need a separate tool to have a console (datastax opscenter)
Cassandra performed very well when we were writing a ~300 GB of data per day on a 3 node cluster. If we had decided to read instead we found minor performance issues. When reading the data we expected as much. But for applications that are very read heavy we would chose a different product such as Couchbase.
Read Abdel Kamel's full review
Rekha Joshi profile photo
March 07, 2016

Review: "Apache Cassandra - Why Would You Look Elsewhere?"

Score 8 out of 10
Vetted Review
Verified User
Review Source
Apache Cassandra is used extensively across the whole of our organization. It is used for various critical use cases and platform solutions where we are creating highly available, linearly scalable systems with tunable consistency. We have used it actively and rigorously for products within tax domain, small businesses, profile platforms, AB testing platforms and it is being used across product groups with great success!
  • As a Java based NoSQL database it has the greatest community and adoption. Coupled with great Apache hadoop, Apache Spark and Solr integration and a strong tools ecosystem(unit tests, stress testing), it is a unbeatable combination!
  • As a hybrid architecture based on masterless architecture as in DynamoDB and column family data model as in BigTable, it hits the bulls eye!
  • It has best in class performance across different kinds of read/write/mixed workloads. It provides linear scalability which works for the best performance, lowest latency and highest throughput.
  • Being a tunable consistency model enables you to have consistency as your platform/application needs.
  • If configured correctly, there is no downtime and no data loss.These are key criterias on critical domains.
  • Apache Cassandra is lacking in some features, which Datastax provides in the Enterprise version. For example, security and advanced tools like OpsCenter. These would be a great addition to open source Apache Cassandra.
  • At times we noticed some versions had issues not known in advance, for example, LostNotificationError on repair of nodes. However steadily the newer releases have become better and more stable.
  • Examples of datastax native driver with Cassandra 2.1 can be improved, as it does not provide all scenarios one would need on production.
  • If you prefer to work with an open source project and be hands on, Apache Cassandra is one of the best. However if you need a managed cassandra like service where you do not even want to configure/deploy/backup/restack, a DynamoDB service would be more preferred.
  • Cassandra is JVM based NoSQL, hence garbage collector tuning is a key aspect, Garbage collection in JDK 8 and G1GC garbage collector is better or configure ConcurrentMarkSweep(CMS) garbage collector in an optimum manner.

Apache Cassandra is a NoSQL database and well suited where you need highly available, linearly scalable, tunable consistency and high performance across varying workloads. It has worked well for our use cases, and I shared my experiences to use it effectively at the last Cassandra summit! http://bit.ly/1Ok56TK

It is a NoSQL database, finally you can tune it to be strongly consistent and successfully use it as such. However those are not usual patterns, as you negotiate on latency. It works well if you require that. If your use case needs strongly consistent environments with semantics of a relational database or if the use case needs a data warehouse, or if you need NoSQL with ACID transactions, Apache Cassandra may not be the optimum choice.

Read Rekha Joshi's full review
David Prinzing profile photo
October 16, 2015

"Cassandra, hands-on review, after 4 years of serious use"

Score 10 out of 10
Vetted Review
Verified User
Review Source
Cassandra is the only database used by Algorithmic Ads. We use it for both real-time transactions and analytics. The primary application accessing Cassandra is a light-weight Java application that provides a RESTful web services API for all our other applications. The API is a focal point for integration and includes both business logic and data. The same API is used both internally and by our customers. We rely on Cassandra for its amazing performance, linear scalability, and continuous availability.
  • Continuous availability: as a fully distributed database (no master nodes), we can update nodes with rolling restarts and accommodate minor outages without impacting our customer services.
  • Linear scalability: for every unit of compute that you add, you get an equivalent unit of capacity. The same application can scale from a single developer's laptop to a web-scale service with billions of rows in a table.
  • Amazing performance: if you design your data model correctly, bearing in mind the queries you need to answer, you can get answers in milliseconds.
  • Time-series data: Cassandra excels at recording, processing, and retrieving time-series data. It's a simple matter to version everything and simply record what happens, rather than going back and editing things. Then, you can compute things from the recorded history.
  • Cassandra is a poor choice for implementing application queues.
  • NoSQL requires thinking differently, and can be challenging for people with strong relational database backgrounds to understand. The CQL language helps with this, but it pays to understand how the engine works under the hood. That said, the benefits outweigh the challenge of the learning curve!
  • Database compactions and anti-entropy repair can be burdensome on a busy cluster. Significant improvements have been made in recent versions, but it remains as an operational challenge.
Cassandra excels in a broad range of applications -- especially if you understand its data model and write your applications accordingly. It's an excellent choice for time-series data, and a poor choice for application queues. It performs the best if you can simply record history and compute from it, rather than going back and editing or deleting things a lot.
Read David Prinzing's full review
Kalpesh Gada profile photo
October 16, 2015

User Review: "Cassandra Rocks !!!"

Score 9 out of 10
Vetted Review
Verified User
Review Source
We used Cassandra to store personalization data of our customers so that we can have this information available through the cluster. The primary advantage of Cassandra is the cluster configuration so that there is not a single point of failure. The writes are faster when you want to write data into the storage. We used it for storing data in JSON format which is used to store anything in JSON format. The data was always up to date and there was less latency when we read from the system. I would highly recommend using Cassandra so as to make a system more scalable and process requests faster.
  • Cassandra is highly scalable.
  • It provides the flexibility to store data in any format. You can add column family dynamically as need by the application.
  • One of the best noSQL solutions I've used so far.
  • A better UI access for reading the data.
  • More graphical information to understand how the data is being processed, system uptime/downtime, etc.
  • I used Cassandra-cli for running quries but it is not very helpful when it returns a lot of results. If there was some way to improve the user queries, it would be great.
I think Cassandra is well suited when we want to store general data that is not really about banking transactions. There is a learning curve involved on how the data is stored and how it is processed.
Read Kalpesh Gada's full review
Gary Ogasawara profile photo
October 12, 2015

Review: "Cassandra as a building block for a distributed object storage system"

Score 9 out of 10
Vetted Review
Verified User
Review Source
Cassandra is used as a component of our HyperStore S3-compatible object storage system. Cassandra is installed on each node and provides the distributed system logic to determine how to store objects. The other components are primarily Java servers that we wrote that work in conjunction with Cassandra to provide a scalable, peer-to-peer, highly fault tolerant system.
  • Performant. In particular, write performance is very good. Recently, a lot of work to address the changing systems environment has been done to take advantage of areas like SSDs and very dense storage systems.
  • Distributed system logic. Multiple data centers and other common network configurations like heterogeneous nodes are handled and exploited well.
  • Community. Strong community with users and project contributors worldwide. The open-source and commercial software people work well together with sharing of lessons learned and improvements based on feedback.
  • Operational tools. Would like to see continued work to improve the operational capability for large clusters and large amounts of data. For example, analyzing the on-disk files.
  • Repair. Being able to run repair continuously and with greater control to avoid any spikes in resource use.
Well suited for multiple data centers, large networks, heterogeneous hardware.
Read Gary Ogasawara's full review
No photo available
July 06, 2017

Review: "Cassandra as NoSQL fault tolerant database choice"

Score 9 out of 10
Vetted Review
Verified User
Review Source
Cassandra is an open-source NoSQL database solution offered by Apache. What's nice about Cassandra is its ability to host the data in multiple nodes in a ring, and changes made to a node in the ring will shard the update to the rest. For geographically dispersed architecture requiring local database storage, this can be a valuable asset which makes this NoSQL option stand above the rest.
  • Cassandra can preform read/writes very quick
  • Nodes in a ring will keep up to date by sharding information to each other
  • Cassandra is well suited for scalable application needing keyspace storage
  • Cassandra's query language is clunky, which is likely due to the nature of NoSQL.
  • Lacking the ability to relate data between sets makes querying harder, but this again is the nature of NoSQL.
Cassandra is suited for applications that need quick read and write abilities. The key to column family relationship allows for super quick lookup and inserts. The nature of the ring cluster allows for fault tolerance, as well as geo-redundant storage. Cassandra is not well suited when needing to use the data to make relational inferences.

Read this authenticated review
No photo available
June 07, 2016

User Review: "What makes Cassandra different!!!!"

Score 6 out of 10
Vetted Review
Verified User
Review Source
I had used Cassandra in my academic projects which were related to cloud computing. I used it for a few projects on Salesforce where multi tenancy features are implemented. In such scenarios Cassandra was one the best choices for NoSql. Although we have used RDMS, the performance while using Cassandra was better.

I have simulated a few real time running apps like Facebook and Uber where I have used RDMS and Cassandra, and checked the performance using Jmeter. It clearly shows that Cassandra boosts the performance over RDMS. One thing I find difficult in Cassandra is following the documents, which are not so understandable.
  • Undoubtedly performance is an important reason
  • We have not encountered a single point of failure
  • Scalability of Cassandra is good which is the most important for the companies where demand is scaling day by day.
  • Cassandra has a wide range of asynchronous jobs and background tasks that are not scheduled by the client, the execution can be eccentric.
  • Because Cassandra is a key-value store, doing things like SUM, MIN, MAX, AVG and other aggregations are incredibly resource intensive if even possible to accomplish.
  • I think querying options for retrieving data is very limited.
Well Suited
Tunable Consistency
Write Speed

Less Appropriate
Ad-Hoc Queries
Unpredictable Performance
Read this authenticated review
No photo available
October 13, 2015

Review: "Cassandra - pretty good if you know what you are doing"

Score 8 out of 10
Vetted Review
Verified User
Review Source
Cassandra is being used as a time series store for sensor data and is used by several researchers within our department.
It serves as the storage layer in our home grown sensor analytics platforms that utilizes spark for the computation. We use it to store billions of samples of wearable sensor data that is collected in various studies and experiments.
  • High Availability - we utilize the data replication features of Cassandra. This enables us to access our data even when several nodes have gone down
  • Data Locality - our architecture combines Cassandra storage nodes and computation nodes in the same machine. This enables us to utilize data locality and limit expensive network IO to read data.
  • Elasticity - Cassandra is a shared nothing architecture. Nodes can be added very easily and they discover the network topology. As soon as a node has joined the Cassandra ring, the data is redistributed among the existing nodes and streamed to it automatically.
  • Cassandra runs on the JVM and therefor may require a lot of GC tuning for read/write intensive applications.
  • Requires manual periodic maintenance - for example it is recommended to run a cleanup on a regular basis.
  • There are a lot of knobs and buttons to configure the system. For many cases the default configuration will be sufficient, but if its not - you will need significant ramp up on the inner workings of Cassandra in order to effectively tune it.
Cassandra has excellent high availability and partition tolerance and has a robust architecture.
It is well suited for storing immutable data as deletes are extremely inefficient. As such, it is well suited for data archive and deep storage.
It is less appropriate for OLAP as has limited aggregation and filtering abilities, and no grouping whatsoever.
Read this authenticated review

Feature Scorecard Summary

Performance (5)
8.2
Availability (5)
8.6
Concurrency (5)
7.8
Security (5)
8.0
Scalability (5)
9.2
Data model flexibility (5)
6.6
Deployment model flexibility (5)
7.0

About Cassandra

Cassandra is a no-SQL database from Apache.
Categories:  NoSQL Databases

Cassandra Technical Details

Operating Systems: Unspecified
Mobile Application:No