TrustRadius: an HG Insights company

Apache Airflow Reviews & Insights

Score8.7 out of 10

58 Reviews and Ratings

Community insights

TrustRadius Insights for Apache Airflow are summaries of user sentiment data from TrustRadius reviews and, when necessary, third party data sources.

Business Problems Solved

Apache Airflow has proven to be a versatile solution for managing and orchestrating various data tasks. Users have utilized this product as a core component for scheduling and monitoring scheduled jobs, inspecting job successes and failures, and troubleshooting errors or failures. It has also been extensively employed in GCP as part of Cloud Composer for running ETL jobs, streamlining data pipelines, and creating workflows for analytics and reporting.

Reviewers have found Apache Airflow to be an easy-to-configure and setup solution, making it ideal for orchestrating data flows and building enterprise data pipelines. Its ability to integrate with third-party solutions via APIs allows for seamless data access and integration. Users have also appreciated the product's capability to manage ETL pipelines and programmatically monitor data pipelines.

Another valuable use case of Apache Airflow is its role in creating workflows, orchestrating data pipelines, and automating tasks. Its flexibility has been particularly beneficial when dealing with complex data pipelines from diverse sources. Furthermore, the product has been effective in performing data integration in AWS S3 region, connecting to relational databases, executing data extracts, and compiling them into multiple flat file segments.

Apache Airflow brings standardization and modularity to data pipelines, enabling the implementation of complex pipelines and facilitating the sharing of data with partners as well as scoring machine learning models. Overall, users have found this product to be a valuable tool for managing data tasks efficiently and effectively.

Reviews

12 Reviews

Apache Airflow master of Schedulers and Orchestrator

Rating: 8 out of 10
Incentivized

Use Cases and Deployment Scope

Apache Airflow is a best orchestrator in market. It gives us to flexibility to orchestrate our data engineering workflows with various levels of modifications possible through python programming. It allows us to connect with various cloud providers like Google, AWS and Azure which enables the teams to work in cross cloud environment.

Pros

  • Provides Connection to different Cloud Providers
  • Good Access Management
  • Good User Interface for Users to interact with. If we need to pause , trigger manually , mark any task as successful etc

Cons

  • A local "dry run" or IDE plugin that can validate and simulate DAG execution without needing a full environment.
  • Better feedback on DAG parse errors in the UI or CLI.
  • Navigating large DAGs with hundreds of tasks can be slow and hard to understand visually.

Likelihood to Recommend

Apache Airflow is well suited when we want to connect to different cloud providers from same interface. It is useful when we have to run or batch pipelines where we pull data from transactional systems and apply business rules.It provides rich scheduling with the help of cron expressions to the batch pipelines however it is not good for real time processing because the task have overhead and take several seconds or minutes to start
Vetted Review
Apache Airflow
5 years of experience

One Stop solution for all the Orchestration needs.

Rating: 10 out of 10
Incentivized

Use Cases and Deployment Scope

I am part of the data platform team, where we are responsible for building the platform for data ingestion, an aggregation system, and the compute engines. Apache Airflow is one of the core systems responsible for orchestrating pipelines and scheduled workflows. We have multiple deployments of Apache Airflow running for different use cases, each with a workflow of 5,000 to 9,000 DAGs and executing even more DAGs. The Apache Airflow now also offers HA with scheduler replicas, which is a lifesaver and is well-maintained by the community.

Pros

  • Apache Airflow is one of the best Orchestration platforms and a go-to scheduler for teams building a data platform or pipelines.
  • Apache Airflow supports multiple operators, such as the Databricks, Spark, and Python operators. All of these provide us with functionality to implement any business logic.
  • Apache Airflow is highly scalable, and we can run a large number of DAGs with ease. It provided HA and replication for workers. Maintaining airflow deployments is very easy, even for smaller teams, and we also get lots of metrics for observability.

Cons

  • To achieve a production-ready deployment of Apache Airflow, you require some level of expertise. A repository of officially maintained sample configurations of Helm charts will be handy for a new team.
  • As airflow is used to build many data pipelines, a feature for building lineage using queries for different compute engines will help develop the data catalog. Typically, multiple tools are required for this use case.
  • For building a data pipeline from upstream to downstream tables, using Airflow with lineage to trigger the downstream DAGs after recovery will be helpful. Additionally, creating a dependency between the DAGs would be beneficial.

Likelihood to Recommend

Airflow is well-suited for data engineering pipelines, creating scheduled workflows, and working with various data sources. You can implement almost any kind of DAG for any use case using the different operators or enforce your operator using the Python operator with ease. The MLOps feature of Airflow can be enhanced to match MLFlow-like features, making Airflow the go-to solution for all workloads, from data science to data engineering.

Scalable Scheduling Framework and Orchestration tool

Rating: 9 out of 10
Incentivized

Use Cases and Deployment Scope

We are using Apache Airflow as an orchestration tool in data engineering workflows in gaming product.
We are scheduling multiple jobs i.e hourly / daily / weekly / monthly.
We have a lot of requirement for dependent jobs i.e job1 should mandatory run before job2, and Apache Airflow does this work very swiftly, we are utilising multiple Apache Airflow integration with webhook and APIs. Additionally, we are doing a lot of jobs monitoring and SLA misses via Apache Airflow features

Pros

  • Job scheduling
  • Dependent job workflows
  • Failure handling and rerun of workflows

Cons

  • Better User Interface

Likelihood to Recommend

Dependent Job scheduling
Rerun mechanism of workflows
High availability deployment strategies

We used it to manage processes for etl pipelines

Rating: 9 out of 10
Incentivized

Use Cases and Deployment Scope

We use Apache Airflow to streamline the data pipelines, create workflows according to the needs of the project and overall monitoring of the functionality itself. In addition, we are using Apache Airflow to solve the problem of retrieving data from Hive before creating the workflow in its entirety. It's also utilized for automation.

Pros

  • In charge of the ETL processes.
  • As there is no incoming or outgoing data, we may handle the scheduling of tasks as code and avoid the requirement for monitoring.

Cons

  • There is no way to assess the processes because they do not keep the metadata.
  • Python is currently the only language supported for creating programmed pipelines.
  • They need to implement both event-based and time-based scheduling.

Likelihood to Recommend

I handle our pipeline scheduling and monitoring. I had minimal problems with Apache Airflow. It's well-suited for data engineers who are responsible for the creation of the data workflows. It is also best suited for the scheduling of the workflow; it allows us to execute Python scripts as well. Finally, Apache Airflow is best suited for the circumstances in which we need a scalable solution.
Vetted Review
Apache Airflow
3 years of experience

Apache AirFlow - Love the Features, Love the Reliability.. Love if the UI get modenized!

Rating: 7 out of 10
Incentivized

Use Cases and Deployment Scope

We use apache airflow as part of our DAG scheduler and health monitoring tool. It serves as a core component in ensuring our scheduled jobs are run, the ability to allow us to inspect jobs successes and failures, and as a troubleshooting tool in an event of job errors/failures. It has been a core tool and we are happy with what it does.

Pros

  • Job scheduling - Pretty straightforward in terms of UI.
  • Job monitoring - Dashboard is as straightforward as it gets.
  • Troubleshooting jobs - ability to dive into detailed errors and navigate the job workflow.

Cons

  • UI/Dashboard can be updated to be customisable, and jobs summary in groups of errors/failures/success, instead of each job, so that a summary of errors can be used as a starting point for reviewing them.
  • Navigation - It's a bit dated. Could do with more modern web navigation UX. i.e. sidebars navigation instead of browser back/forward.
  • Again core functional reorg in terms of UX. Navigation can be improved for core functions as well, instead of discovery.

Likelihood to Recommend

For a quick job scanning of status and deep-diving into job issues, details, and flows, AirFlow does a good job. No fuss, no muss. The low learning curve as the UI is very straightforward, and navigating it will be familiar after spending some time using it. Our requirements are pretty simple. Job scheduler, workflows, and monitoring. The jobs we run are >100, but still is a lot to review and troubleshoot when jobs don't run. So when managing large jobs, AirFlow dated UI can be a bit of a drawback.

Apache Airflow is flawless

Rating: 9 out of 10
Incentivized

Use Cases and Deployment Scope

We use Apache Airflow to perform data integration in AWS S3 region. With this we are able to connect to a relational database, easily execute data extracts, and compile them all in multiple flat file segments. Airflow brings a lot of standardization as well as modularity. We also use it to send data to partners and score ML models. It allows us to implement complex data pipelines easily.

Pros

  • Multiple helpful features
  • Very intuitive flow charts
  • Reruns and backfills are very easy
  • SLA and DAGs are easy to set up

Cons

  • Potentially a steep learning curve
  • The browser UI could do with a few enhancements

Likelihood to Recommend

Using Apache Airflow has been extremely helpful, as it means we can get to our endgame faster. This product has enabled us to translate our ideas into projects at a much faster speed than before we had this software. We manage data ingestion and modeling for multiple products and customers within each product. Each has its own pipeline with its own code.
Vetted Review
Apache Airflow
2 years of experience

A great solution to help orchestrate workflows and pipelines

Rating: 9 out of 10
Incentivized

Use Cases and Deployment Scope

Apache airflow is a great way to orchestrate workflows and build enterprise data pipelines. It is very easy to configure and setup and would be my go to solution for orchestrating data flows. We use Airflow to integrate our solution via APIs and allow third party solutions to access our solution and data held within in it.

Pros

  • Orchestrate workflows
  • Visualise workflows easily using DAG
  • Integrate 3rd party data sources

Cons

  • Visualisation UI could be improved in my opinion.
  • Enterprise features
  • Performance improvements in bigger deployments.

Likelihood to Recommend

Well suited for anyone that wants to orchestrate data pipelines and workflows. Good for developing, scheduling, and monitoring data workflows and is capable of managing complex enterprise workloads and pipelines. The visual aspect of understanding how your workflows are inter-connected is especially useful.

Apache Airflow software

Rating: 9 out of 10
Incentivized

Use Cases and Deployment Scope

Apache Airflow is used for the scheduling and orchestration of data pipelines or workflows. Orchestration of data pipelines refers to the sequencing, coordination, scheduling, and managing of complex data pipelines from diverse sources. It is also helpful when your data pipelines change slowly (days or weeks – not hours or minutes), are related to a specific time interval, or are pre-scheduled.

Pros

  • Scheduling of data pipelines or workflows.
  • Orchestration of data pipelines or workflows.

Cons

  • Not intuitive for new users.
  • Setting up Airflow architecture for production is NOT easy.

Likelihood to Recommend

Ease of use—you only need a little python knowledge to get started. Open-source community—Airflow is free and has a large community of active users. Apache Airflow is used for the scheduling and orchestration of data pipelines or workflows. Orchestration of data pipelines refers to the sequencing, coordination, scheduling, and managing of complex data pipelines from diverse sources.
Vetted Review
Apache Airflow
1 year of experience

Apache Airflow for Startups

Rating: 8 out of 10
Incentivized

Use Cases and Deployment Scope

Used Airflow for Analytics & Reporting

Pros

  • Reports
  • Sending Bulk Email/Notification
  • Processing from different data sources

Cons

  • Improve the GUI Control Panel
  • Provide more example and documentation
  • Improvement in debugging

Likelihood to Recommend

Very well suited for building ETL, Automated report generation as the workflow steps can be well defined and debugging is minimal. It can also be used for sending bulk email/sms/push notifications.

But when more complex workflows have to be implemented where the response of a task can create multiple branches and there are multiple feedback loops, the tool can become tedious.

A very nice job scheduler of DAGs that could become even better

Rating: 7 out of 10
Incentivized

Use Cases and Deployment Scope

We use apache Airflow in GCP as part of Cloud Composer to run all our ETL jobs.

Pros

  • schedule jobs
  • graphing job flow and dependencies and retries
  • Nice UI for visualization

Cons

  • Instead of using a Storage bucket as a source, will be nice if the DAGs can be pulled by a private git repo directly
  • Upgrade process could be smoother

Likelihood to Recommend

If you are using GCP, you can use Apache Airflow very easily by using Cloud Composer which is the managed service for Airflow. If you need to deploy it yourself, installation and setup could be tricky.