TrustRadius Insights for Apache Airflow are summaries of user sentiment data from TrustRadius reviews and, when necessary, third party data sources.
Business Problems Solved
Apache Airflow has proven to be a versatile solution for managing and orchestrating various data tasks. Users have utilized this product as a core component for scheduling and monitoring scheduled jobs, inspecting job successes and failures, and troubleshooting errors or failures. It has also been extensively employed in GCP as part of Cloud Composer for running ETL jobs, streamlining data pipelines, and creating workflows for analytics and reporting.
Reviewers have found Apache Airflow to be an easy-to-configure and setup solution, making it ideal for orchestrating data flows and building enterprise data pipelines. Its ability to integrate with third-party solutions via APIs allows for seamless data access and integration. Users have also appreciated the product's capability to manage ETL pipelines and programmatically monitor data pipelines.
Another valuable use case of Apache Airflow is its role in creating workflows, orchestrating data pipelines, and automating tasks. Its flexibility has been particularly beneficial when dealing with complex data pipelines from diverse sources. Furthermore, the product has been effective in performing data integration in AWS S3 region, connecting to relational databases, executing data extracts, and compiling them into multiple flat file segments.
Apache Airflow brings standardization and modularity to data pipelines, enabling the implementation of complex pipelines and facilitating the sharing of data with partners as well as scoring machine learning models. Overall, users have found this product to be a valuable tool for managing data tasks efficiently and effectively.
Apache Airflow is a best orchestrator in market. It gives us to flexibility to orchestrate our data engineering workflows with various levels of modifications possible through python programming. It allows us to connect with various cloud providers like Google, AWS and Azure which enables the teams to work in cross cloud environment.
Pros
Provides Connection to different Cloud Providers
Good Access Management
Good User Interface for Users to interact with. If we need to pause , trigger manually , mark any task as successful etc
Cons
A local "dry run" or IDE plugin that can validate and simulate DAG execution without needing a full environment.
Better feedback on DAG parse errors in the UI or CLI.
Navigating large DAGs with hundreds of tasks can be slow and hard to understand visually.
Likelihood to Recommend
Apache Airflow is well suited when we want to connect to different cloud providers from same interface. It is useful when we have to run or batch pipelines where we pull data from transactional systems and apply business rules.It provides rich scheduling with the help of cron expressions to the batch pipelines however it is not good for real time processing because the task have overhead and take several seconds or minutes to start
I am part of the data platform team, where we are responsible for building the platform for data ingestion, an aggregation system, and the compute engines. Apache Airflow is one of the core systems responsible for orchestrating pipelines and scheduled workflows. We have multiple deployments of Apache Airflow running for different use cases, each with a workflow of 5,000 to 9,000 DAGs and executing even more DAGs. The Apache Airflow now also offers HA with scheduler replicas, which is a lifesaver and is well-maintained by the community.
Pros
Apache Airflow is one of the best Orchestration platforms and a go-to scheduler for teams building a data platform or pipelines.
Apache Airflow supports multiple operators, such as the Databricks, Spark, and Python operators. All of these provide us with functionality to implement any business logic.
Apache Airflow is highly scalable, and we can run a large number of DAGs with ease. It provided HA and replication for workers. Maintaining airflow deployments is very easy, even for smaller teams, and we also get lots of metrics for observability.
Cons
To achieve a production-ready deployment of Apache Airflow, you require some level of expertise. A repository of officially maintained sample configurations of Helm charts will be handy for a new team.
As airflow is used to build many data pipelines, a feature for building lineage using queries for different compute engines will help develop the data catalog. Typically, multiple tools are required for this use case.
For building a data pipeline from upstream to downstream tables, using Airflow with lineage to trigger the downstream DAGs after recovery will be helpful. Additionally, creating a dependency between the DAGs would be beneficial.
Likelihood to Recommend
Airflow is well-suited for data engineering pipelines, creating scheduled workflows, and working with various data sources. You can implement almost any kind of DAG for any use case using the different operators or enforce your operator using the Python operator with ease. The MLOps feature of Airflow can be enhanced to match MLFlow-like features, making Airflow the go-to solution for all workloads, from data science to data engineering.
We are using Apache Airflow as an orchestration tool in data engineering workflows in gaming product. We are scheduling multiple jobs i.e hourly / daily / weekly / monthly. We have a lot of requirement for dependent jobs i.e job1 should mandatory run before job2, and Apache Airflow does this work very swiftly, we are utilising multiple Apache Airflow integration with webhook and APIs. Additionally, we are doing a lot of jobs monitoring and SLA misses via Apache Airflow features
Pros
Job scheduling
Dependent job workflows
Failure handling and rerun of workflows
Cons
Better User Interface
Likelihood to Recommend
Dependent Job scheduling Rerun mechanism of workflows High availability deployment strategies
We use Apache Airflow to streamline the data pipelines, create workflows according to the needs of the project and overall monitoring of the functionality itself. In addition, we are using Apache Airflow to solve the problem of retrieving data from Hive before creating the workflow in its entirety. It's also utilized for automation.
Pros
In charge of the ETL processes.
As there is no incoming or outgoing data, we may handle the scheduling of tasks as code and avoid the requirement for monitoring.
Cons
There is no way to assess the processes because they do not keep the metadata.
Python is currently the only language supported for creating programmed pipelines.
They need to implement both event-based and time-based scheduling.
Likelihood to Recommend
I handle our pipeline scheduling and monitoring. I had minimal problems with Apache Airflow. It's well-suited for data engineers who are responsible for the creation of the data workflows. It is also best suited for the scheduling of the workflow; it allows us to execute Python scripts as well. Finally, Apache Airflow is best suited for the circumstances in which we need a scalable solution.
VU
Verified User
Engineer in Information Technology (10,001+ employees)
We use apache airflow as part of our DAG scheduler and health monitoring tool. It serves as a core component in ensuring our scheduled jobs are run, the ability to allow us to inspect jobs successes and failures, and as a troubleshooting tool in an event of job errors/failures. It has been a core tool and we are happy with what it does.
Pros
Job scheduling - Pretty straightforward in terms of UI.
Job monitoring - Dashboard is as straightforward as it gets.
Troubleshooting jobs - ability to dive into detailed errors and navigate the job workflow.
Cons
UI/Dashboard can be updated to be customisable, and jobs summary in groups of errors/failures/success, instead of each job, so that a summary of errors can be used as a starting point for reviewing them.
Navigation - It's a bit dated. Could do with more modern web navigation UX. i.e. sidebars navigation instead of browser back/forward.
Again core functional reorg in terms of UX. Navigation can be improved for core functions as well, instead of discovery.
Likelihood to Recommend
For a quick job scanning of status and deep-diving into job issues, details, and flows, AirFlow does a good job. No fuss, no muss. The low learning curve as the UI is very straightforward, and navigating it will be familiar after spending some time using it. Our requirements are pretty simple. Job scheduler, workflows, and monitoring. The jobs we run are >100, but still is a lot to review and troubleshoot when jobs don't run. So when managing large jobs, AirFlow dated UI can be a bit of a drawback.
We use Apache Airflow to perform data integration in AWS S3 region. With this we are able to connect to a relational database, easily execute data extracts, and compile them all in multiple flat file segments. Airflow brings a lot of standardization as well as modularity. We also use it to send data to partners and score ML models. It allows us to implement complex data pipelines easily.
Pros
Multiple helpful features
Very intuitive flow charts
Reruns and backfills are very easy
SLA and DAGs are easy to set up
Cons
Potentially a steep learning curve
The browser UI could do with a few enhancements
Likelihood to Recommend
Using Apache Airflow has been extremely helpful, as it means we can get to our endgame faster. This product has enabled us to translate our ideas into projects at a much faster speed than before we had this software. We manage data ingestion and modeling for multiple products and customers within each product. Each has its own pipeline with its own code.
VU
Verified User
Engineer in Information Technology (5001-10,000 employees)
Apache airflow is a great way to orchestrate workflows and build enterprise data pipelines. It is very easy to configure and setup and would be my go to solution for orchestrating data flows. We use Airflow to integrate our solution via APIs and allow third party solutions to access our solution and data held within in it.
Pros
Orchestrate workflows
Visualise workflows easily using DAG
Integrate 3rd party data sources
Cons
Visualisation UI could be improved in my opinion.
Enterprise features
Performance improvements in bigger deployments.
Likelihood to Recommend
Well suited for anyone that wants to orchestrate data pipelines and workflows. Good for developing, scheduling, and monitoring data workflows and is capable of managing complex enterprise workloads and pipelines. The visual aspect of understanding how your workflows are inter-connected is especially useful.
Apache Airflow is used for the scheduling and orchestration of data pipelines or workflows. Orchestration of data pipelines refers to the sequencing, coordination, scheduling, and managing of complex data pipelines from diverse sources. It is also helpful when your data pipelines change slowly (days or weeks – not hours or minutes), are related to a specific time interval, or are pre-scheduled.
Pros
Scheduling of data pipelines or workflows.
Orchestration of data pipelines or workflows.
Cons
Not intuitive for new users.
Setting up Airflow architecture for production is NOT easy.
Likelihood to Recommend
Ease of use—you only need a little python knowledge to get started. Open-source community—Airflow is free and has a large community of active users. Apache Airflow is used for the scheduling and orchestration of data pipelines or workflows. Orchestration of data pipelines refers to the sequencing, coordination, scheduling, and managing of complex data pipelines from diverse sources.
Very well suited for building ETL, Automated report generation as the workflow steps can be well defined and debugging is minimal. It can also be used for sending bulk email/sms/push notifications.
But when more complex workflows have to be implemented where the response of a task can create multiple branches and there are multiple feedback loops, the tool can become tedious.
We use apache Airflow in GCP as part of Cloud Composer to run all our ETL jobs.
Pros
schedule jobs
graphing job flow and dependencies and retries
Nice UI for visualization
Cons
Instead of using a Storage bucket as a source, will be nice if the DAGs can be pulled by a private git repo directly
Upgrade process could be smoother
Likelihood to Recommend
If you are using GCP, you can use Apache Airflow very easily by using Cloud Composer which is the managed service for Airflow. If you need to deploy it yourself, installation and setup could be tricky.