Apache Airflow is an open source tool that can be used to programmatically author, schedule and monitor data pipelines using Python and SQL.
N/A
Control-M
Score 9.3 out of 10
N/A
Control-M from BMC is a platform for integrating, automating, and orchestrating application and data workflows in production across complex hybrid technology ecosystems. It provides deep operational capabilities, delivering speed, scale, security, and governance.
$29,000
per year
Google BigQuery
Score 8.8 out of 10
N/A
Google's BigQuery is part of the Google Cloud Platform, a database-as-a-service (DBaaS) supporting the querying and rapid analysis of enterprise data.
$6.25
per TiB (after the 1st 1 TiB per month, which is free)
We need end to end automation, so using Control-M we can achieve this but by using Azure Automation it automates data pipelines across the cloud. Compared to Autosys, Control-M has more advanced and modern UI and simple job life cycle management. When compared with Apache …
Control-M is generally considered superior to AutoSys for organizations requiring a centralized, cloud-forward, and developer-friendly orchestration platform. While AutoSys remains a robust choice for on-premises batch processing, Control-M excels in hybrid multicloud …
It supports both on-prem and cloud environments. Easy to handle complex, multi-system workflows more efficiently. Helped reduce manual intervention, which has decreased errors and operational costs.
Control-M stands out in the workload automation space for its robust orchestration capabilities, especially in complex, hybrid IT environmentsActiveBatch: Known for its rich integrations and intuitive workflow designer,
ActiveBatch often scores higher in ease of use and …
We have been using Control-M for almost 20 years and our teams are already trained to use it. Other tools are not as robust and resilient as Control-M. Control-M's graphical interface is very easy to use.
We would still use Control-M as overall orchestrator to manage the workflows created in these other products. We do not consider them as a replacement for Control-M.
It is a very innovative and feature-rich solution and can be used to complete many diverse tasks and solve different issues, resulting in significant time savings and cost-effectiveness.
The architecture of ETL was influenced by Data processing component which is Dataproc and there was a need for easy Query console with Access control capabilities with lesser overhead in managing the permission. This made the decision to move with Google BigQuery compare to …
We actually use Snowflake and BigQuery in tandem because they both currently meet various needs. Redshift, however, has barely been used since our migration away from it. In the case of both Snowflake and BigQuery, they beat Redshift by a long shot. The main reasons are their …
Airflow is well-suited for data engineering pipelines, creating scheduled workflows, and working with various data sources. You can implement almost any kind of DAG for any use case using the different operators or enforce your operator using the Python operator with ease. The MLOps feature of Airflow can be enhanced to match MLFlow-like features, making Airflow the go-to solution for all workloads, from data science to data engineering.
Anytime you have a process that has to do multiple things, transfer data, interact with other systems, Control-M is critical. Not only does it provide the insight to what is going on, but it also lets you keep tight audit controls over access, reduces the need to spend large amounts of time tracking down issues, reduces the need to write custom "code" to do integrations with other systems and helps you better manage and track critical SLAs for workflows across the business.
Event-based data can be captured seamlessly from our data layers (and exported to Google BigQuery). When events like page-views, clicks, add-to-cart are tracked, Google BigQuery can help efficiently with running queries to observe patterns in user behaviour. That intermediate step of trying to "untangle" event data is resolved by Google BigQuery. A scenario where it could possibly be less appropriate is when analysing "granular" details (like small changes to a database happening very frequently).
Apache Airflow is one of the best Orchestration platforms and a go-to scheduler for teams building a data platform or pipelines.
Apache Airflow supports multiple operators, such as the Databricks, Spark, and Python operators. All of these provide us with functionality to implement any business logic.
Apache Airflow is highly scalable, and we can run a large number of DAGs with ease. It provided HA and replication for workers. Maintaining airflow deployments is very easy, even for smaller teams, and we also get lots of metrics for observability.
The good thing is that there are so many connectors available. Control-M provides lots of features, and we are using almost 60 to 70% of them. Control-M is providing us with so much capability to use during our daily problem-solving.
Most of the job creation is very simple and quick and worked as per expectations.
Testing and debugging are also very easy, and you can test multiple scenarios using temporary changes during job runs.
Log and output presentations are also very good, short and detailed.
To monitor specific job net, we can create viewpoint, which can be use on daily basis.
GSheet data can be linked to a BigQuery table and the data in that sheet is ingested in realtime into BigQuery. It's a live 'sync' which means it supports insertions, deletions, and alterations. The only limitation here is the schema'; this remains static once the table is created.
Seamless integration with other GCP products.
A simple pipeline might look like this:-
GForms -> GSheets -> BigQuery -> Looker
It all links up really well and with ease.
One instance holds many projects.
Separating data into datamarts or datameshes is really easy in BigQuery, since one BigQuery instance can hold multiple projects; which are isolated collections of datasets.
UI/Dashboard can be updated to be customisable, and jobs summary in groups of errors/failures/success, instead of each job, so that a summary of errors can be used as a starting point for reviewing them.
Navigation - It's a bit dated. Could do with more modern web navigation UX. i.e. sidebars navigation instead of browser back/forward.
Again core functional reorg in terms of UX. Navigation can be improved for core functions as well, instead of discovery.
I haven't come across too many spots where I'm not happy with the product. Most of the shortfalls were in my knowledge of the product as opposed to the actual product. Currently we're having a little bit of an issue with the deployment of the software to the servers, but it's more of an "us" problem than a product problem. I can't really give any good examples of shortfalls of the product that I've found so far.
Please expand the availability of documentation, tutorials, and community forums to provide developers with comprehensive support and guidance on using Google BigQuery effectively for their projects.
If possible, simplify the pricing model and provide clearer cost breakdowns to help users understand and plan for expenses when using Google BigQuery. Also, some cost reduction is welcome.
It still misses the process of importing data into Google BigQuery. Probably, by improving compatibility with different data formats and sources and reducing the complexity of data ingestion workflows, it can be made to work.
It is one of the best solutions on the market, in terms of innovation, reliability and stability. Control-M provides security when used by the largest companies in Mexico such as banks, department stores and logistics. It has proven to be able to integrate with new technologies on the market and provide almost 100% availability, thanks to its automatic FailOver scheme.
We have to use this product as its a 3rd party supplier choice to utilise this product for their data side backend so will not be likely we will move away from this product in the future unless the 3rd party supplier decides to change data vendors.
For its capability to connect with multicloud environments. Access Control management is something that we don't get in all the schedulers and orchestrators. But although it provides so many flexibility and options to due to python , some level of knowledge of python is needed to be able to build workflows.
User experience is meeting my expectations. We had a manual checklist, which Control-M Reports has now replaced, that helped us check the jobs without any issues. So, being fair with the work, the ratings should also be fair. More to come as the AI progresses; this will not only help motivate the Control-M Developers but also lead to the development of advanced technology.
I think overall it is easy to use. I haven't done anything from the development side but an more of an end user of reporting tables built in Google BigQuery. I connect data visualization tools like Tableau or Power BI to the BigQuery reporting tables to analyze trends and create complex dashboards.
Secondary Instances: Control-M supports the installation of a secondary instance of the entire Control-M environment, Control-M/EM, or Control-M/Server.Automatic & Manual Failover: In case of a failure on the primary host, Control-M can automatically failover to the secondary host if using Oracle or MSSQL databases. Manual failover is also an option, enabling a controlled switch during planned maintenance.Fallback: After resolving the issue on the primary host, you can easily fall back to it, or even designate the secondary host as the new primary. Database Replication: For high availability, Control-M leverages database replication from the primary site to a disaster recovery site. While replication is essential, its implementation and maintenance are the user's responsibility.
I have never had any significant issues with Google Big Query. It always seems to be up and running properly when I need it. I cannot recall any times where I received any kind of application errors or unplanned outages. If there were any they were resolved quickly by my IT team so I didn't notice them.
good page load times, efficient report completion, and minimal impact on integrated systems. Specifically, the well-designed GUI contributes to a positive user experience, and the platform's ability to automate various stages of the workflow, including Big Data processes, is highlighted as a key strength. Fast Page Loads: Control-M is reported to have a responsive user interface with fast page load times, allowing users to quickly navigate and manage their workflows
I think Google Big Query's performance is in the acceptable range. Sometimes larger datasets are somewhat sluggish to load but for most of our applications it performs at a reasonable speed. We do have some reports that include a lot of complex calculations and others that run on granular store level data that so sometimes take a bit longer to load which can be frustrating.
Support is generally excellent. Getting lower priority ones resolved can take a while, but it's rare for something to have to be dumped in the "unfixable" bin. If you end up speaking to Houston or Tel Aviv, then you know you've got a "live one".
BigQuery can be difficult to support because it is so solid as a product. Many of the issues you will see are related to your own data sets, however you may see issues importing data and managing jobs. If this occurs, it can be a challenge to get to speak to the correct person who can help you.
Very knowledgeable instructors provide a hands-on, collaborative learning experience and can interact directly with instructors to develop our Control-M skills. This format allows for immediate feedback, in-depth discussions, and tailored guidance, leading to a deeper understanding of Control-M concepts and practical application. Face-to-face interaction fosters higher engagement and a more dynamic learning environment.
Simple and easy to use web based, well paced. Available any time. All online courses are simple and easy to access and use. Very practical everyday use scenarios and solutions. Incorporates software simulations, learning games, and built-in assessments to enhance comprehension and engagement. Online subscriptions are regularly updated with the latest product information, ensuring users have access to the most current knowledge.
As HA we have depend on the external DB, why don't we have HA feasibility with embedded DB. As with external DB, there are performance issues and fine tuning the DB. As if its embedded DB, Control-M it self take care of the functionality.
Multiple DAGs can be orchestrated simultaneously at varying times, and runs can be reproduced or replicated with relative ease. Overall, utilizing Apache Airflow is easier to use than other solutions now on the market. It is simple to integrate in Apache Airflow, and the workflow can be monitored and scheduling can be done quickly using Apache Airflow. We advocate using this tool for automating the data pipeline or process.
Control-M: Known for its comprehensive workload automation capabilities, handling complex job scheduling, dependency management, and IT process automation. TWS: Traditionally strong in Batch processing and job scheduling, focusing on high-performance computing environments. TIDEL: Offers a combination of workload automation and IT process management, often used in mainframe environments.AutoSys: Provides job scheduling and workflow management with a reputation for scalability and performance.
PowerBI can connect to GA4 for example but the data processing is more complicated and it takes longer to create dashboards. Azure is great once the data import has been configured but it's not an easy task for small businesses as it is with BigQuery.
While Control-M offers flexibility with usage-based and subscription-based pricing, some users might prefer more predictable, upfront costs, especially for large-scale deployments. A potential area for improvement could be offering more options for fixed-term contracts with predictable pricing based on factors like the number of agents or jobs, providing a clearer budget for long-term planning
awesome product.Control-M delivers advanced operational capabilities easily consumed by Dev, Ops, data teams, and lines of business.Control-M Workflow InsightsApplication and data workflow observability: Increased confidence that SLAs are being met for Control-M users and IT leadersComprehensive control and management capabilities: Enhanced dashboards and reporting with constant telemetry and intelligent analysis on executing workflowsSelf-service visibility: In-depth reporting to help teams work autonomously
We have continued to expand out use of Google Big Query over the years. I'd say its flexibility and scalability is actually quite good. It also integrates well with other tools like Tableau and Power BI. It has served the needs of multiple data sources across multiple departments within my company.
Strengths: The vendor provided strong post-sales support, timely issue resolution, and effective onboarding. Their technical team was knowledgeable and responsive, ensuring smooth integration and minimal disruption. Training resources and documentation were comprehensive. Areas for Improvement: While overall service was excellent, occasional delays in advanced customization or escalations slightly impacted timelines. More proactive optimization suggestions could further enhance value.
Google Support has kindly provide individual support and consultants to assist with the integration work. In the circumstance where the consultants are not present to support with the work, Google Support Helpline will always be available to answer to the queries without having to wait for more than 3 days.
Impact Depends on number of workflows. If there are lot of workflows then it has a better usecase as the implementation is justified as it needs resources , dedicated VMs, Database that has a cost
Since centralizing all our workflows in Control-M, we've cut end to end processing time by nearly 30%
Before Control-M we were babysitting scripts, manually rerunning failed jobs, and chasing ghost errors. With automated recovery, smart notifications, and fewer failures slipping through the cracks, we have saved 3 hours a day across teams
Our workflows success rate sits at 99.95% and when things do fail, they are pinpointed immediately
Previously, running complex queries on our on-premise data warehouse could take hours. Google BigQuery processes the same queries in minutes. We estimate it saves our team at least 25% of their time.
We can target our marketing campaigns very easily and understand our customer behaviour. It lets us personalize marketing campaigns and product recommendations and experience at least a 20% improvement in overall campaign performance.
Now, we only pay for the resources we use. Saved $1 million annually on data infrastructure and data storage costs compared to our previous solution.