Apache Spark vs. PostgreSQL

Overview
ProductRatingMost Used ByProduct SummaryStarting Price
Apache Spark
Score 8.7 out of 10
N/A
N/AN/A
PostgreSQL
Score 8.5 out of 10
N/A
PostgreSQL (alternately Postgres) is a free and open source object-relational database system boasting over 30 years of active development, reliability, feature robustness, and performance. It supports SQL and is designed to support various workloads flexibly.N/A
Pricing
Apache SparkPostgreSQL
Editions & Modules
No answers on this topic
No answers on this topic
Offerings
Pricing Offerings
Apache SparkPostgreSQL
Free Trial
NoNo
Free/Freemium Version
NoNo
Premium Consulting/Integration Services
NoNo
Entry-level Setup FeeNo setup feeNo setup fee
Additional Details
More Pricing Information
Community Pulse
Apache SparkPostgreSQL
Considered Both Products
Apache Spark

No answer on this topic

PostgreSQL
Chose PostgreSQL
I found PostgreSQL to be better compared to MySQL. The community support is very good. Some features that I feel are not present in MySQL are:
  • No referential integrity.
  • No constraints (CHECK).
Chose PostgreSQL
Compared to MySQL, it works well if you need to extend to your use case
Compared to Spark, it works better w.r.t development time in a central database setting
Like Redis, it cannot be used for caching and quick access of non-structured data
Top Pros
Top Cons
Best Alternatives
Apache SparkPostgreSQL
Small Businesses

No answers on this topic

Redis™*
Redis™*
Score 9.0 out of 10
Medium-sized Companies
Cloudera Manager
Cloudera Manager
Score 9.7 out of 10
Redis™*
Redis™*
Score 9.0 out of 10
Enterprises
IBM Analytics Engine
IBM Analytics Engine
Score 9.3 out of 10
Redis™*
Redis™*
Score 9.0 out of 10
All AlternativesView all alternativesView all alternatives
User Ratings
Apache SparkPostgreSQL
Likelihood to Recommend
9.9
(24 ratings)
8.7
(53 ratings)
Likelihood to Renew
10.0
(1 ratings)
9.0
(1 ratings)
Usability
10.0
(3 ratings)
9.0
(6 ratings)
Availability
-
(0 ratings)
9.0
(1 ratings)
Performance
-
(0 ratings)
7.0
(1 ratings)
Support Rating
8.7
(4 ratings)
9.3
(7 ratings)
Implementation Rating
-
(0 ratings)
9.0
(1 ratings)
Product Scalability
-
(0 ratings)
8.0
(1 ratings)
User Testimonials
Apache SparkPostgreSQL
Likelihood to Recommend
Apache
Well suited: To most of the local run of datasets and non-prod systems - scalability is not a problem at all. Including data from multiple types of data sources is an added advantage. MLlib is a decently nice built-in library that can be used for most of the ML tasks. Less appropriate: We had to work on a RecSys where the music dataset that we used was around 300+Gb in size. We faced memory-based issues. Few times we also got memory errors. Also the MLlib library does not have support for advanced analytics and deep-learning frameworks support. Understanding the internals of the working of Apache Spark for beginners is highly not possible.
Read full review
PostgreSQL Global Development Group
PostgreSQL, unlike other databases, is user-friendly and uses an open-source database. Ideal for relational databases, they can be accessed when speed and efficiency are required. It enables high-availability and disaster recovery replication from instance to instance. PostgreSQL can store data in a JSON format, including hashes, keys, and values. Multi-platform compatibility is also a big selling point. We could, however, use all the DBMS’s cores. While it works well in fast environments, it can be problematic in slower ones or cause multiple master replication.
Read full review
Pros
Apache
  • Apache Spark makes processing very large data sets possible. It handles these data sets in a fairly quick manner.
  • Apache Spark does a fairly good job implementing machine learning models for larger data sets.
  • Apache Spark seems to be a rapidly advancing software, with the new features making the software ever more straight-forward to use.
Read full review
PostgreSQL Global Development Group
  • The stability it offers, its speed of response and its resource management is excellent even in complex database environments and with low-resource machines.
  • The large amount of resources it has in addition to the many own and third-party tools that are compatible that make productivity greatly increase.
  • The adaptability in various environments, whether distributed or not, [is a] complete set of configuration options which allows to greatly customize the work configuration according to the needs that are required.
  • The excellent handling of referential and transactional integrity, its internal security scheme, the ease with which we can create backups are some of the strengths that can be mentioned.
Read full review
Cons
Apache
  • Memory management. Very weak on that.
  • PySpark not as robust as scala with spark.
  • spark master HA is needed. Not as HA as it should be.
  • Locality should not be a necessity, but does help improvement. But would prefer no locality
Read full review
PostgreSQL Global Development Group
  • The query syntax for JSON fields is unwieldy when you start getting into complex queries with many joins.
  • I wish there was a distinction (a flag) you could set for automated scripts vs working in the psql CLI, which would provide an 'Are you sure you want to do X?' type prompt if your query is likely to affect more than a certain number of rows. Especially on updates/deletes. Setting the flag in the headless(scripted) flow would disable the prompt.
  • Better documentation around JSON and Array aggregation, with more examples of how the data is transformed.
Read full review
Likelihood to Renew
Apache
Capacity of computing data in cluster and fast speed.
Read full review
PostgreSQL Global Development Group
As a needed software for day to day development activities
Read full review
Usability
Apache
The only thing I dislike about spark's usability is the learning curve, there are many actions and transformations, however, its wide-range of uses for ETL processing, facility to integrate and it's multi-language support make this library a powerhouse for your data science solutions. It has especially aided us with its lightning-fast processing times.
Read full review
PostgreSQL Global Development Group
Postgresql is the best tool out there for relational data so I have to give it a high rating when it comes to analytics, data availability and consistency, so on and so forth. SQL is also a relatively consistent language so when it comes to building new tables and loading data in from the OLTP database, there are enough tools where we can perform ETL on a scalable basis.
Read full review
Reliability and Availability
Apache
No answers on this topic
PostgreSQL Global Development Group
PostgreSQL's availability is top notch. Apart from connection time-out for an idle user, the database is super reliable.
Read full review
Performance
Apache
No answers on this topic
PostgreSQL Global Development Group
The data queries are relatively quick for a small to medium sized table. With complex joins, and a wide and deep table however, the performance of the query has room for improvement.
Read full review
Support Rating
Apache
1. It integrates very well with scala or python. 2. It's very easy to understand SQL interoperability. 3. Apache is way faster than the other competitive technologies. 4. The support from the Apache community is very huge for Spark. 5. Execution times are faster as compared to others. 6. There are a large number of forums available for Apache Spark. 7. The code availability for Apache Spark is simpler and easy to gain access to. 8. Many organizations use Apache Spark, so many solutions are available for existing applications.
Read full review
PostgreSQL Global Development Group
There are several companies that you can contract for technical support, like EnterpriseDB or Percona, both first level in expertise and commitment to the software.
But we do not have contracts with them, we have done all the way from googling to forums, and never have a problem that we cannot resolve or pass around. And for dozens of projects and more than 15 years now.
Read full review
Online Training
Apache
No answers on this topic
PostgreSQL Global Development Group
The online training is request based. Had there been recorded videos available online for potential users to benefit from, I could have rated it higher. The online documentation however is very helpful. The online documentation PDF is downloadable and allows users to pace their own learning. With examples and code snippets, the documentation is great starting point.
Read full review
Implementation Rating
Apache
No answers on this topic
PostgreSQL Global Development Group
The online documentation of the PostgreSQL product is elaborate and takes users step by step.
Read full review
Alternatives Considered
Apache
All the above systems work quite well on big data transformations whereas Spark really shines with its bigger API support and its ability to read from and write to multiple data sources. Using Spark one can easily switch between declarative versus imperative versus functional type programming easily based on the situation. Also it doesn't need special data ingestion or indexing pre-processing like Presto. Combining it with Jupyter Notebooks (https://github.com/jupyter-incubator/sparkmagic), one can develop the Spark code in an interactive manner in Scala or Python
Read full review
PostgreSQL Global Development Group
Postgres stacks up just [fine] along the other big players in the RDBMS world. It's very popular for a reason. It's very close to MySQL in terms of cost and features - I'd pick either solution and be just as happy. Compared to Oracle it is a MUCH cheaper solution that is just as usable.
Read full review
Scalability
Apache
No answers on this topic
PostgreSQL Global Development Group
The DB is reliable, scalable, easy to use and resolves most DB needs
Read full review
Return on Investment
Apache
  • Faster turn around on feature development, we have seen a noticeable improvement in our agile development since using Spark.
  • Easy adoption, having multiple departments use the same underlying technology even if the use cases are very different allows for more commonality amongst applications which definitely makes the operations team happy.
  • Performance, we have been able to make some applications run over 20x faster since switching to Spark. This has saved us time, headaches, and operating costs.
Read full review
PostgreSQL Global Development Group
  • The user-role system has saved us tons of time and thus money. As I mentioned in the "Use Case" section, Postgres is not only used by engineering but also finance to measure how much to charge customers and customer support to debug customer issues. Sure, it's not easy for non-technical employees to psql in and view raw tables, but it has saved engineering hundreds of man-hours that would have had to be spent on building equivalent tools to serve finance or customer support.
  • It provides incredibly trustworthy storage for wherever customer data dumped in. In our 6 years of Postgres existence, we have not lost a byte of customer data due to Postgres messing up a transaction or during the multiple times the hard-drives failed (thanks to ACID compliance!).
  • This is less significant, but Postgres is also quite easy to manage (unless you are going above and beyond to squeeze out every last bit of performance). There's not much to configure, and the out of the box settings are quite sane. That has saved us engineers lots of time that would have gone into Postgres administration.
Read full review
ScreenShots