Skip to main content
TrustRadius
Apache Airflow

Apache Airflow

Overview

What is Apache Airflow?

Apache Airflow is an open source tool that can be used to programmatically author, schedule and monitor data pipelines using Python and SQL. Created at Airbnb as an open-source project in 2014, Airflow was brought into the Apache Software Foundation’s…

Read more
Recent Reviews

TrustRadius Insights

Apache Airflow has proven to be a versatile solution for managing and orchestrating various data tasks. Users have utilized this product …
Continue reading
Read all reviews

Awards

Products that are considered exceptional by their customers based on a variety of criteria win TrustRadius awards. Learn more about the types of TrustRadius awards to make the best purchase decision. More about TrustRadius Awards

Popular Features

View all 6 features
  • Multi-platform scheduling (9)
    8.8
    88%
  • Central monitoring (9)
    8.4
    84%
  • Logging (9)
    8.1
    81%
  • Alerts and notifications (9)
    7.9
    79%

Reviewer Pros & Cons

View all pros & cons
Return to navigation

Pricing

View all pricing
N/A
Unavailable

What is Apache Airflow?

Apache Airflow is an open source tool that can be used to programmatically author, schedule and monitor data pipelines using Python and SQL. Created at Airbnb as an open-source project in 2014, Airflow was brought into the Apache Software Foundation’s Incubator Program 2016 and announced as Top…

Entry-level set up fee?

  • No setup fee

Offerings

  • Free Trial
  • Free/Freemium Version
  • Premium Consulting/Integration Services

Would you like us to let the vendor know that you want pricing?

28 people also want pricing

Alternatives Pricing

N/A
Unavailable
What is Control-M?

Control-M from BMC is a platform for integrating, automating, and orchestrating application and data workflows in production across complex hybrid technology ecosystems. It provides deep operational capabilities, delivering speed, scale, security, and governance.

What is Superblocks?

Superblocks is an IDE for internal tooling – a programmable set of building blocks for developers to create mission-critical internal operational software. The Superblocks Application Builder to assemble flexible components and connect to databases and APIs. Users can create REST, GraphQL, and gPRC…

Return to navigation

Product Demos

Getting Started with Apache Airflow

YouTube

Apache Airflow | Build your custom operator for twitter API

YouTube
Return to navigation

Features

Workload Automation

Workload automation tools manage event-based scheduling and resource management across a wide variety of applications, databases and architectures

8.2
Avg 8.2
Return to navigation

Product Details

What is Apache Airflow?

Apache Airflow Video

What's coming in Airflow 2.0?

Apache Airflow Technical Details

Operating SystemsUnspecified
Mobile ApplicationNo

Frequently Asked Questions

Apache Airflow is an open source tool that can be used to programmatically author, schedule and monitor data pipelines using Python and SQL. Created at Airbnb as an open-source project in 2014, Airflow was brought into the Apache Software Foundation’s Incubator Program 2016 and announced as Top-Level Apache Project in 2019. It is used as a data orchestration solution, with over 140 integrations and community support.

Reviewers rate Multi-platform scheduling highest, with a score of 8.8.

The most common users of Apache Airflow are from Enterprises (1,001+ employees).
Return to navigation

Comparisons

View all alternatives
Return to navigation

Reviews and Ratings

(35)

Community Insights

TrustRadius Insights are summaries of user sentiment data from TrustRadius reviews and, when necessary, 3rd-party data sources. Have feedback on this content? Let us know!

Apache Airflow has proven to be a versatile solution for managing and orchestrating various data tasks. Users have utilized this product as a core component for scheduling and monitoring scheduled jobs, inspecting job successes and failures, and troubleshooting errors or failures. It has also been extensively employed in GCP as part of Cloud Composer for running ETL jobs, streamlining data pipelines, and creating workflows for analytics and reporting.

Reviewers have found Apache Airflow to be an easy-to-configure and setup solution, making it ideal for orchestrating data flows and building enterprise data pipelines. Its ability to integrate with third-party solutions via APIs allows for seamless data access and integration. Users have also appreciated the product's capability to manage ETL pipelines and programmatically monitor data pipelines.

Another valuable use case of Apache Airflow is its role in creating workflows, orchestrating data pipelines, and automating tasks. Its flexibility has been particularly beneficial when dealing with complex data pipelines from diverse sources. Furthermore, the product has been effective in performing data integration in AWS S3 region, connecting to relational databases, executing data extracts, and compiling them into multiple flat file segments.

Apache Airflow brings standardization and modularity to data pipelines, enabling the implementation of complex pipelines and facilitating the sharing of data with partners as well as scoring machine learning models. Overall, users have found this product to be a valuable tool for managing data tasks efficiently and effectively.

Based on user reviews, here are the most common recommendations for Apache Airflow:

  1. Read the documentation and take an introduction course to fully understand Airflow's behavior and close any knowledge gaps.

  2. Consider Airflow as a first choice for ETL tasks that require programming. However, keep in mind that the coding aspect may not be suitable for all ETL engineers.

  3. Replace cron jobs with Airflow for better results, utilizing its scheduling and dependency management features.

Overall, these recommendations emphasize the importance of familiarizing oneself with the documentation, leveraging Airflow's capabilities for programming-centric ETL tasks, and using it to replace traditional cron jobs.

Reviews

(1-4 of 4)
Companies can't remove reviews or game the system. Here's why
Score 9 out of 10
Vetted Review
Verified User
Incentivized
We use Apache Airflow to streamline the data pipelines, create workflows according to the needs of the project and overall monitoring of the functionality itself. In addition, we are using Apache Airflow to solve the problem of retrieving data from Hive before creating the workflow in its entirety. It's also utilized for automation.
  • In charge of the ETL processes.
  • As there is no incoming or outgoing data, we may handle the scheduling of tasks as code and avoid the requirement for monitoring.
  • There is no way to assess the processes because they do not keep the metadata.
  • Python is currently the only language supported for creating programmed pipelines.
  • They need to implement both event-based and time-based scheduling.
I handle our pipeline scheduling and monitoring. I had minimal problems with Apache Airflow. It's well-suited for data engineers who are responsible for the creation of the data workflows. It is also best suited for the scheduling of the workflow; it allows us to execute Python scripts as well. Finally, Apache Airflow is best suited for the circumstances in which we need a scalable solution.
Workload Automation (6)
95%
9.5
Multi-platform scheduling
100%
10.0
Central monitoring
100%
10.0
Logging
90%
9.0
Alerts and notifications
90%
9.0
Analysis and visualization
100%
10.0
Application integration
90%
9.0
  • Most of the ETL processes were automated, cutting down on human labor.
  • Apache Airflow's user interface (UI) was very informative and straightforward.
  • Since ETL processes were providing data via airflow, we were able to gain a deeper comprehension of the data at hand.
Multiple DAGs can be orchestrated simultaneously at varying times, and runs can be reproduced or replicated with relative ease. Overall, utilizing Apache Airflow is easier to use than other solutions now on the market. It is simple to integrate in Apache Airflow, and the workflow can be monitored and scheduling can be done quickly using Apache Airflow. We advocate using this tool for automating the data pipeline or process.
Score 7 out of 10
Vetted Review
Verified User
Incentivized
We use apache airflow as part of our DAG scheduler and health monitoring tool. It serves as a core component in ensuring our scheduled jobs are run, the ability to allow us to inspect jobs successes and failures, and as a troubleshooting tool in an event of job errors/failures. It has been a core tool and we are happy with what it does.
  • Job scheduling - Pretty straightforward in terms of UI.
  • Job monitoring - Dashboard is as straightforward as it gets.
  • Troubleshooting jobs - ability to dive into detailed errors and navigate the job workflow.
  • UI/Dashboard can be updated to be customisable, and jobs summary in groups of errors/failures/success, instead of each job, so that a summary of errors can be used as a starting point for reviewing them.
  • Navigation - It's a bit dated. Could do with more modern web navigation UX. i.e. sidebars navigation instead of browser back/forward.
  • Again core functional reorg in terms of UX. Navigation can be improved for core functions as well, instead of discovery.
For a quick job scanning of status and deep-diving into job issues, details, and flows, AirFlow does a good job. No fuss, no muss. The low learning curve as the UI is very straightforward, and navigating it will be familiar after spending some time using it. Our requirements are pretty simple. Job scheduler, workflows, and monitoring. The jobs we run are >100, but still is a lot to review and troubleshoot when jobs don't run. So when managing large jobs, AirFlow dated UI can be a bit of a drawback.
Workload Automation (6)
88.33333333333334%
8.8
Multi-platform scheduling
100%
10.0
Central monitoring
100%
10.0
Logging
100%
10.0
Alerts and notifications
80%
8.0
Analysis and visualization
70%
7.0
Application integration
80%
8.0
  • It is a good workflow job scheduler.
  • It meets all, if not most of our organization product requirements.
  • AirFlow stability in terms of the product reliability is unmatched.
Using Jenkins and Kafka, it is not for the same purpose, although it might be similar. I would say AirFlow is really what it says on the can - workflow management. For our organisation, the purpose is clear. So long your aim is to have a rich workflow scheduler and job management, AirFlow is the go-to. Use the tool for what it's meant for, and it will meet your need for sure.
Jenkins, Apache Kafka, Redis™*
Score 7 out of 10
Vetted Review
Verified User
Incentivized
We use apache Airflow in GCP as part of Cloud Composer to run all our ETL jobs.
  • schedule jobs
  • graphing job flow and dependencies and retries
  • Nice UI for visualization
  • Instead of using a Storage bucket as a source, will be nice if the DAGs can be pulled by a private git repo directly
  • Upgrade process could be smoother
If you are using GCP, you can use Apache Airflow very easily by using Cloud Composer which is the managed service for Airflow. If you need to deploy it yourself, installation and setup could be tricky.
Workload Automation (6)
65%
6.5
Multi-platform scheduling
90%
9.0
Central monitoring
70%
7.0
Logging
50%
5.0
Alerts and notifications
50%
5.0
Analysis and visualization
60%
6.0
Application integration
70%
7.0
  • Triggering ETL jobs based on events
  • running DAGs in a scheduled fashion
April 04, 2022

Apache Airflow

PRABHAT MISHRA | TrustRadius Reviewer
Score 8 out of 10
Vetted Review
Verified User
Incentivized
We are using apache airflow for managing the ETL pipelines. We are using programmatically to monitor the data pipeline. I have been helping the data team in creating the pipeline using apache airflow.
  • We are using for the workflow management system
  • managing the etl pipelines.
  • We can manage the task scheduling as code & need not monitor as there is no data in & out.
  • they should bring in some time based scheduling too not only event based
  • they do not store the metadata due to which we are not able to analyze the workflows
  • they only support python as of now for scripted pipeline writing
We were using it for managing the workflows for the etl pipelines as code so Airflow came as very helpful.
Workload Automation (6)
83.33333333333334%
8.3
Multi-platform scheduling
80%
8.0
Central monitoring
70%
7.0
Logging
80%
8.0
Alerts and notifications
90%
9.0
Analysis and visualization
90%
9.0
Application integration
90%
9.0
  • We had a better understanding of data as ETL pipelines were giving data using airflow
  • We were able to automate most of the ETL pipelines so it reduced manual efforts
  • Airflow UI was extremely helpful which made it easy to understand
Airflow was best suited in my use case for designing the ETL pipelines in a scripted manner for workflows & the UI was very good & easy to use.
Return to navigation