Apache Airflow vs. IBM DataStage

Overview
ProductRatingMost Used ByProduct SummaryStarting Price
Apache Airflow
Score 8.6 out of 10
N/A
Apache Airflow is an open source tool that can be used to programmatically author, schedule and monitor data pipelines using Python and SQL. Created at Airbnb as an open-source project in 2014, Airflow was brought into the Apache Software Foundation’s Incubator Program 2016 and announced as Top-Level Apache Project in 2019. It is used as a data orchestration solution, with over 140 integrations and community support.N/A
IBM DataStage
Score 8.0 out of 10
N/A
IBM® DataStage® is a data integration tool that helps users to design, develop and run jobs that move and transform data. At its core, the DataStage tool supports extract, transform and load (ETL) and extract, load and transform (ELT) patterns. A basic version of the software is available for on-premises deployment, and the cloud-based DataStage for IBM Cloud Pak® for Data offers automated integration capabilities in a hybrid or multicloud environment.N/A
Pricing
Apache AirflowIBM DataStage
Editions & Modules
No answers on this topic
No answers on this topic
Offerings
Pricing Offerings
Apache AirflowIBM DataStage
Free Trial
NoYes
Free/Freemium Version
YesNo
Premium Consulting/Integration Services
NoNo
Entry-level Setup FeeNo setup feeNo setup fee
Additional Details
More Pricing Information
Community Pulse
Apache AirflowIBM DataStage
Features
Apache AirflowIBM DataStage
Workload Automation
Comparison of Workload Automation features of Product A and Product B
Apache Airflow
8.8
12 Ratings
5% above category average
IBM DataStage
-
Ratings
Multi-platform scheduling9.312 Ratings00 Ratings
Central monitoring9.012 Ratings00 Ratings
Logging8.712 Ratings00 Ratings
Alerts and notifications9.312 Ratings00 Ratings
Analysis and visualization7.012 Ratings00 Ratings
Application integration9.312 Ratings00 Ratings
Data Source Connection
Comparison of Data Source Connection features of Product A and Product B
Apache Airflow
-
Ratings
IBM DataStage
9.5
10 Ratings
13% above category average
Connect to traditional data sources00 Ratings10.010 Ratings
Connecto to Big Data and NoSQL00 Ratings9.09 Ratings
Data Transformations
Comparison of Data Transformations features of Product A and Product B
Apache Airflow
-
Ratings
IBM DataStage
8.0
10 Ratings
3% below category average
Simple transformations00 Ratings8.010 Ratings
Complex transformations00 Ratings8.010 Ratings
Data Modeling
Comparison of Data Modeling features of Product A and Product B
Apache Airflow
-
Ratings
IBM DataStage
6.3
10 Ratings
23% below category average
Data model creation00 Ratings5.07 Ratings
Metadata management00 Ratings5.09 Ratings
Business rules and workflow00 Ratings6.09 Ratings
Collaboration00 Ratings6.010 Ratings
Testing and debugging00 Ratings6.010 Ratings
Data Governance
Comparison of Data Governance features of Product A and Product B
Apache Airflow
-
Ratings
IBM DataStage
6.0
9 Ratings
31% below category average
Integration with data quality tools00 Ratings6.09 Ratings
Integration with MDM tools00 Ratings6.09 Ratings
Best Alternatives
Apache AirflowIBM DataStage
Small Businesses

No answers on this topic

Skyvia
Skyvia
Score 10.0 out of 10
Medium-sized Companies
ActiveBatch Workload Automation
ActiveBatch Workload Automation
Score 7.8 out of 10
IBM InfoSphere Information Server
IBM InfoSphere Information Server
Score 8.0 out of 10
Enterprises
Control-M
Control-M
Score 9.3 out of 10
IBM InfoSphere Information Server
IBM InfoSphere Information Server
Score 8.0 out of 10
All AlternativesView all alternativesView all alternatives
User Ratings
Apache AirflowIBM DataStage
Likelihood to Recommend
8.5
(10 ratings)
8.0
(10 ratings)
Usability
10.0
(1 ratings)
8.0
(3 ratings)
Performance
-
(0 ratings)
9.0
(1 ratings)
Support Rating
-
(0 ratings)
9.6
(3 ratings)
User Testimonials
Apache AirflowIBM DataStage
Likelihood to Recommend
Apache
Airflow is well-suited for data engineering pipelines, creating scheduled workflows, and working with various data sources. You can implement almost any kind of DAG for any use case using the different operators or enforce your operator using the Python operator with ease. The MLOps feature of Airflow can be enhanced to match MLFlow-like features, making Airflow the go-to solution for all workloads, from data science to data engineering.
Read full review
IBM
Excellent Cloud data mapping tool and easy creating multiple project data analytics in real-time and the report distribution are excellent via this IBM product. Easy tool to provide data visualization and the integration is effective and helpful to migrating huge amounts of data across other platforms and different websites insights gathering.
Read full review
Pros
Apache
  • Apache Airflow is one of the best Orchestration platforms and a go-to scheduler for teams building a data platform or pipelines.
  • Apache Airflow supports multiple operators, such as the Databricks, Spark, and Python operators. All of these provide us with functionality to implement any business logic.
  • Apache Airflow is highly scalable, and we can run a large number of DAGs with ease. It provided HA and replication for workers. Maintaining airflow deployments is very easy, even for smaller teams, and we also get lots of metrics for observability.
Read full review
IBM
  • Data movement
  • Seamless integration of scripts and etl jobs
  • Descriptive logging
  • Ability to work with myriad of data assets
  • Direct integration for Governance catalog
Read full review
Cons
Apache
  • UI/Dashboard can be updated to be customisable, and jobs summary in groups of errors/failures/success, instead of each job, so that a summary of errors can be used as a starting point for reviewing them.
  • Navigation - It's a bit dated. Could do with more modern web navigation UX. i.e. sidebars navigation instead of browser back/forward.
  • Again core functional reorg in terms of UX. Navigation can be improved for core functions as well, instead of discovery.
Read full review
IBM
  • Connector Stages to Snowflake on the cloud. We had some issues initially but since then had been corrected.
  • Accessing tool from a browser (zero foot-print). Currently we need to either install locally or connect to a server to do ETL work.
  • Diversify ways of authenticating users.
Read full review
Usability
Apache
For its capability to connect with multicloud environments. Access Control management is something that we don't get in all the schedulers and orchestrators. But although it provides so many flexibility and options to due to python , some level of knowledge of python is needed to be able to build workflows.
Read full review
IBM
Because it is robust, and it is being continuously improved. DS is one of the most used and recognized tools in the market. Large companies have implemented it in the first instance to develop their DW, but finding the advantages it has, they could use it for other types of projects such as migrations, application feeding, etc.
Read full review
Performance
Apache
No answers on this topic
IBM
It could load thousands of records in seconds. But in the Parallel version, you need to understand how to particionate the data. If you use the algorithms erroneously, or the functionalities that it gives for the parsing of data, the performance can fall drastically, even with few records. It is necessary to have people with experience to be able to determine which algorithm to use and understand why.
Read full review
Support Rating
Apache
No answers on this topic
IBM
I believe that IBM generally has one of the worst and most complex assistance systems (physical and online) that exists.
Read full review
Alternatives Considered
Apache
Multiple DAGs can be orchestrated simultaneously at varying times, and runs can be reproduced or replicated with relative ease. Overall, utilizing Apache Airflow is easier to use than other solutions now on the market. It is simple to integrate in Apache Airflow, and the workflow can be monitored and scheduling can be done quickly using Apache Airflow. We advocate using this tool for automating the data pipeline or process.
Read full review
IBM
It's obvious since they both are from the same vendors and it makes it easier and can get better rates for licensing. Also, sales rapes are very helpful in case of escalations and critical issues.
Read full review
Return on Investment
Apache
  • Impact Depends on number of workflows. If there are lot of workflows then it has a better usecase as the implementation is justified as it needs resources , dedicated VMs, Database that has a cost
  • Donot use it if you have very less usecases
Read full review
IBM
  • Reduce development time by 65% compared with hand coding.
  • Reduces ETL process maintenance times.
  • Better data governance for technical and non-technical people.
  • Improve time to market for initiatives that require data integration.
Read full review
ScreenShots