Amazon EMR is a cloud-native big data platform for processing vast amounts of data quickly, at scale. Using open source tools such as Apache Spark, Apache Hive, Apache HBase, Apache Flink, Apache Hudi (Incubating), and Presto, coupled with the scalability of Amazon EC2 and scalable storage of Amazon S3, EMR gives analytical teams the engines and elasticity to run Petabyte-scale analysis.
N/A
EDB Postgres Advanced Server
Score 10.0 out of 10
N/A
The EDB Postgres Advanced Server is an advanced deployment of the PostgreSQL relational database with greater features and Oracle compatibility, from EnterpriseDB headquartered in Bedford, Massachusetts.
N/A
PostgreSQL
Score 8.7 out of 10
N/A
PostgreSQL (alternately Postgres) is a free and open source object-relational database system boasting over 30 years of active development, reliability, feature robustness, and performance. It supports SQL and is designed to support various workloads flexibly.
PPAS proved better for our customer's data-centric apps than Oracle in all but a few edge cases (encryption at rest and multi-TB database-tier backups) because it is simpler to install/maintain, runs nearly all Oracle-syntax SQL as well as ANSI SQL. PPAS has much more JSON …
We are running it to perform preparation which takes a few hours on EC2 to be running on a spark-based EMR cluster to total the preparation inside minutes rather than a few hours. Ease of utilization and capacity to select from either Hadoop or spark. Processing time diminishes from 5-8 hours to 25-30 minutes compared with the Ec2 occurrence and more in a few cases.
It's great if you are using or wish to use PostgreSQL and need the added performance optimization, security features and developer and DBA tools. If you need compatibility with Oracle it's a must-have. There are many developer features that greatly assist dev teams in integrating and implementing complex middleware. It's great for optimizing complex database queries as well as for scaling. I would recommend Postgres Plus Advanced Server for any software development team that is hitting the limit of what PostgreSQL is capable of and wants to improve performance, security, and gain extra developer tools.
PostgreSQL is best used for structured data, and best when following relational database design principles. I would not use PostgreSQL for large unstructured data such as video, images, sound files, xml documents, web-pages, especially if these files have their own highly variable, internal structure.
EMR does well in managing the cost as it uses the task node cores to process the data and these instances are cheaper when the data is stored on s3. It is really cost efficient. No need to maintain any libraries to connect to AWS resources.
EMR is highly available, secure and easy to launch. No much hassle in launching the cluster (Simple and easy).
EMR manages the big data frameworks which the developer need not worry (no need to maintain the memory and framework settings) about the framework settings. It's all setup on launch time. The bootstrapping feature is great.
PPAS Oracle compatibility, especially the PL/SQL syntax, has made migrating database-tier code very simple. Most Oracle packages do not need to be changed at all and those that do are generally for simple reasons like a reserved word in PPAS that is allowed in Oracle.
PPAS xDB, the multi-master replication tool, is simple and - most important - does not break with network or other interruptions. We have been able to configure and forget, which our customers could never do with other multi-master tools.
Most people had no idea that PPAS and PostgreSQL have full CRUD support for JSON. They think you need a specialized product and/or that JSON is read-only. Every organization that I have worked with is evaluating adding JSON to their relational model.
It would have been better if packages like HBase and Flume were available with Amazon EMR. This would make the product even more helpful in some cases.
Products like Cloudera provide the options to move the whole deployment into a dedicated server and use it at our discretion. This would have been a good option if available with EMR.
If EMR gave the option to be used with any choice of cloud provider, it would have helped instead of having to move the data from another cloud service to S3.
Documentation is excellent but spread out across many resources and can take a while to wade through—would benefit from having more intro level, getting started guides for various languages.
Ruby support is excellent but more Ruby examples and beginner-level documentation would be nice.
It is sometimes hard to find a community of users on StackOverflow so a larger community, and a dedicated forum with active members to answer questions and work through issues would be nice.
Documentation is quite good and the product is regularly updated, so new features regularly come out. The setup is straightforward enough, especially once you have already established the overall platform infrastructure and the aws-cli APIs are easy enough to use. It would be nice to have some out-of-the-box integrations for checking logs and the Spark UI, rather than relying on know-how and digging through multiple levels to find the informations
Postgresql is the best tool out there for relational data so I have to give it a high rating when it comes to analytics, data availability and consistency, so on and so forth. SQL is also a relatively consistent language so when it comes to building new tables and loading data in from the OLTP database, there are enough tools where we can perform ETL on a scalable basis.
The data queries are relatively quick for a small to medium sized table. With complex joins, and a wide and deep table however, the performance of the query has room for improvement.
I give the overall support for Amazon EMR this rating because while the support technicians are very knowledgeable and always able to help, it sometimes takes a very long time to get in contact with one of the support technicians. So overall the support is pretty good for Amazon EMR.
There are several companies that you can contract for technical support, like EnterpriseDB or Percona, both first level in expertise and commitment to the software.
But we do not have contracts with them, we have done all the way from googling to forums, and never have a problem that we cannot resolve or pass around. And for dozens of projects and more than 15 years now.
The online training is request based. Had there been recorded videos available online for potential users to benefit from, I could have rated it higher. The online documentation however is very helpful. The online documentation PDF is downloadable and allows users to pace their own learning. With examples and code snippets, the documentation is great starting point.
Snowflake is a lot easier to get started with than the other options. Snowflake's data lake building capabilities are far more powerful. Although Amazon EMR isn't our first pick, we've had an excellent experience with EC2 and S3. Because of our current API interfaces, it made more sense for us to continue with Hadoop rather than explore other options.
PPAS proved better for our customer's data-centric apps than Oracle in all but a few edge cases (encryption at rest and multi-TB database-tier backups) because it is simpler to install/maintain, runs nearly all Oracle-syntax SQL as well as ANSI SQL. PPAS has much more JSON capabilities (full CRUD vs. read-only in Oracle), simpler geospatial, simpler / more stable replication and datatypes that match developer expectations, such as BOOLEAN and ENUMs.
Although the competition between the different databases is increasingly aggressive in the sense that they provide many improvements, new functionalities, compatibility with complementary components or environments, in some cases it requires that it be followed within the same family of applications that performs the company that develops it and that is not all bad, but being able to adapt or configure different programs, applications or other environments developed by third parties apart is what gives PostgreSQL a certain advantage and this diversification in the components that can be joined with it, is the reason why it is a great option to choose.
It was obviously cheaper and convenient to use as most of our data processing and pipelines are on AWS. It was fast and readily available with a click and that saved a ton of time rather than having to figure out the down time of the cluster if its on premises.
It saved time on processing chunks of big data which had to be processed in short period with minimal costs. EMR solved this as the cluster setup time and processing was simple, easy, cheap and fast.
It had a negative impact as it was very difficult in submitting the test jobs as it lags a UI to submit spark code snippets.
Postgres Plus Advanced Server is quite complex and may take longer to implement certain things than simply using PostgreSQL depending on developer familiarity with the platform.
Getting up to speed can be daunting so again, there is an upfront cost in time spent learning the platform, besides the potential for extra time spent on a feature-by-feature basis.
The cost of Postgres Plus Advanced Server should be weighed against simply using PostgreSQL to decide which is the best solution for your business needs.
Easy to administer so our DevOps team has only ever used minimal time to setup, tune, and maintain.
Easy to interface with so our Engineering team has only ever used minimal time to query or modify the database. Getting the data is straightforward, what we do with it is the bigger concern.