IBM® DataStage® is a data integration tool that helps users to design, develop and run jobs that move and transform data. At its core, the DataStage tool supports extract, transform and load (ETL) and extract, load and transform (ELT) patterns. A basic version of the software is available for on-premises deployment, and the cloud-based DataStage for IBM Cloud Pak® for Data offers automated integration capabilities in a hybrid or multicloud environment.
N/A
Informatica Cloud Data Quality
Score 6.8 out of 10
N/A
The vendor states that Informatica Data Quality empowers companies to take a holistic approach to managing data quality across the entire organization, and that with Informatica Data Quality, users are able to ensure the success of data-driven digital transformation initiatives and projects across users, types, and scale, while also automating mission-critical tasks.
Currently not using any of the Informatica tools, so, I don't have a real way of comparing the tools. But comparison against Microsoft SSIS (SQL Server Integration Services) I'd say DataStage stacks favorably. DataStage is a powerful tool for ETL processes that integrates …
DataStage is somewhat outdated for an ETL. I guess that's what makes it a bit lagged behind its competitors. It can be used for data processing, sure, but its performance seems to be lagging behind or quite slow given the server it is running from. I won’t depend on this application if it's handling a lot of mission-critical banking and business data.
For effective data collaboration, systematic verification of customer information, and address, among others, Informatica Data Quality is a fruitful application to consider. Besides, Informatica Data Quality controls quality through a cleansing process, giving the company a professional outline of candid data profiling and reputable analytics. Finally, Informatica Data Quality allows the simplistic navigation of content, with a dashboard that supports predictability.
The matching algorithms in IDQ are very powerful if you understand the different types that they offer (e.g., Hamming Distance, Jaro, Bigram, etc..). We had to play around with it to see which best suit our own needs of identifying and eliminating duplicate customers. Setting up the whole process (e.g., creating the KeyGenerator Transformation, setting up the matching threshold, etc..) can be somewhat time consuming and a challenge if you don't first standardize your data.
The integration with PowerCenter is great if you have both. You can either import your mappings directly to PowerCenter or to an XML file. The only downside is that some of the transformations are unique to IDQ, so you are not really able to edit them once in PowerCenter.
The standardizer transformation was key in helping us standardize our customer data (e.g., names, addresses, etc..). It was helpful due to having create a reference table containing the standardized value and the associated unstandardized values. What was great was that if you used Informatica Analyst, a business analyst could login and correct any of the values.
Technical support is a key area IBM should improve for this product. Sometimes our case is assigned to a support engineer and he has no idea of the product or services.
Provide custom reports for datastage jobs and performance such as job history reports, warning messages or error messages.
Make it fully compatible with Oracle and users can direct use of Oracle ODBC drivers instead of Data Direct driver. Same for SQL server.
As pointed out earlier, due all the robust features IDQ has, our use f the product is successful and stable. IDQ is being used in multiple sources (from CRM application and in batch mode). As this is an iterative process, we are looking to improve our system efficiency using IDQ.
Because it is robust, and it is being continuously improved. DS is one of the most used and recognized tools in the market. Large companies have implemented it in the first instance to develop their DW, but finding the advantages it has, they could use it for other types of projects such as migrations, application feeding, etc.
It could load thousands of records in seconds. But in the Parallel version, you need to understand how to particionate the data. If you use the algorithms erroneously, or the functionalities that it gives for the parsing of data, the performance can fall drastically, even with few records. It is necessary to have people with experience to be able to determine which algorithm to use and understand why.
IBM offers different levels of support but in my experience being and IBM shop helps to get direct support from more knowledgeable technicians from IBM. Not sure on the cost of having this kind of support, but I know there's also general support and community blogs and websites on the Internet make it easy to troubleshoot issues whenever there's need for that.
With effective capabilities and easy to manipulate the features and easy to produce accurate data analytics and the Cloud services Automation, this IBM platform is more reliable and easy to document management. The features on this platform are equipped with excellent big data management and easy to provide accurate data analytics.
IDQ is used by a department at my organisation to ensure and enhance the data quality. The usage was started with address standardization and now it had been brought to altogether a next level of quality check where it fixes duplicates, junk characters, standardize the names, streets, product descriptions. In the past we had issues mainly with duplicate customers and products and this were affecting the sales projection and estimates.
It’s hard to say at this point, it delivers, but not quite as I expected. It takes a lot of resources to manage and sort this out (manpower, financial).
Definitely, I don’t have the exact numbers, but given the data it processes, it is A LOT. So props to the developer of this application.
Again, based on my experience, I’d choose other ETL apps if there is one that's more user-friendly.