Apache Airflow master of Schedulers and Orchestrator
Use Cases and Deployment Scope
Apache Airflow is a best orchestrator in market. It gives us to flexibility to orchestrate our data engineering workflows with various levels of modifications possible through python programming. It allows us to connect with various cloud providers like Google, AWS and Azure which enables the teams to work in cross cloud environment.
Pros
- Provides Connection to different Cloud Providers
- Good Access Management
- Good User Interface for Users to interact with. If we need to pause , trigger manually , mark any task as successful etc
Cons
- A local "dry run" or IDE plugin that can validate and simulate DAG execution without needing a full environment.
- Better feedback on DAG parse errors in the UI or CLI.
- Navigating large DAGs with hundreds of tasks can be slow and hard to understand visually.
Likelihood to Recommend
Apache Airflow is well suited when we want to connect to different cloud providers from same interface. It is useful when we have to run or batch pipelines where we pull data from transactional systems and apply business rules.It provides rich scheduling with the help of cron expressions to the batch pipelines however it is not good for real time processing because the task have overhead and take several seconds or minutes to start
