Data Integration Tools
Best Data Integration Tools
- Top Rated Data Integration Tools include: Centerprise Data Integrator, Oracle GoldenGate, Talend Open Studio, PowerCenter, Oracle Warehouse Builder, and Informatica Enterprise Data Integration.
- Other Data Integration Tools on the TrustMap include: InfoSphere, SAS/Access, SSIS, Talend Data Integration, and SAS Data Integration Studio.
- A complete list of Data Integration Tools is available here.
TrustMaps are two-dimensional charts that compare products based on satisfaction ratings and research frequency by prospective buyers. Products must have 10 or more ratings to appear on this TrustMap, and those above the median line are considered Top Rated.
Data Integration Tools Overview
What are Data Integration Tools?
The need for data integration emerges from complex data center environments where multiple different systems are creating large volumes of data. This data must be understood in aggregate, rather than in isolation. Data integration is nothing more than a technique and technology for providing a unified and consistent view of enterprise-wide data.
Data Integration Tools Features & Capabilities
Ability to process data from a wide variety of sources such as mainframes, enterprise applications, spreadsheets, proprietary databases, etc.
Ability to process unstructured data from social media, email, web pages, etc.
Syntactic and semantic checks to make sure data conforms to business rules and policies
Deduplication and removal of incorrectly or improperly formatted data
Support for metadata
Types of Data Integration
There are several different approaches to achieving this goal which are quite different to each other and essentially solve slightly different problems: The main technologies for data integration are Extract, Transform Load (ETL), Enterprise Application Integration (EAI), and Enterprise Information Integration (EII), or data virtualization as it is more often called today.
Products listed in this category belong to the ETL data integration approach. Unlike the other listed approaches, ETL is designed for data migration and integration of large volumes of data to provide a basis for decision-making.
What is ETL?
ETL is a process whereby large volumes of required data are extracted from various databases and converted into a common format. The data is then cleaned, and loaded into the specialized reporting database called a data warehouse. It is then available for standard reporting purposes.
The data used in ETL can come from any source including flat files, Excel data, application data like CRM or ERP data, or mainframe application data. Perhaps the most difficult part of the process is the “Transform” component. Here, not only must the data be cleansed and any duplicates removed, but the software also has to resolve data consistency issues. It applies rules to consistently convert data to the appropriate form for the data warehouse or repository.
Once the data has been loaded into a data warehouse it is available for querying by business intelligence front-end processes that can pull consolidated data into reports and dashboards.
Shortcomings of Data Warehouses
One shortcoming of the data warehouse approach is that the data is not always current. Data warehouses pull data from databases periodically in batches, not in real time. If the data in the source database has changed, this might not be reflected in the data in the warehouse. Various strategies can be employed to achieve “real-time ETL”, although some of them place a significant load on the database. This can have performance repercussions.
The simplest thing to do is simply increase the frequency of batch updates to near real-time processing. But there are other solutions including continuously feeding the database using real-time data transport technologies, the use of staging tables, or a real-time data cache.
Enterprise-level data integration tools can be very expensive with some products costing upwards of $10,000 per user per year. On top of that, you may need to pay for professional services to get up and running. SMB solutions are significantly cheaper than this.
Data Integration Products
Microsoft's SQL Server Integration Services (SSIS) is a data integration solution.
Oracle GoldenGate is database management software for data integration, and availability support for heterogeneous databases.
Centerprise Data Integrator is an integration platform that includes tools for data integration, data transformation, data quality, and data profiling.
Oracle has about 3,500 data integration customer and competes with Informatica, IBM InfoSphere, Talend and SAS DatFlux.
Dataloader.io delivers a cloud based solution to import and export information from Salesforce.
Informatica is the leading vendor in the data integration space with about 5,000 customers.
IBM InfoSphere is an enterprise grade master data management solution used by over 700 customers. It competes with Oracle's Siebel UCM product and Informatica.
The Talend Integration Suite, from Talend, is a set of tools for data integration.
Oracle Warehouse Builder is a data integration solution, from Oracle.
Cisco Data Virtualization, formerly Composite (acquired July 2013) is, as the name might suggest, a data or datacenter virtualization platform.
SAP's Sybase Replication Server is database development and management software.
SAP NetWeaver Process Integration is an application integration solution.
The elastic.io Integration Platform is designed as a set of tools for data transformation and data integration both cloud-to-cloud and cloud-to-ground. It belongs to the hybrid integration platform category, as it can be deployed in the cloud as well as on-premise. The elastic.io platform is mi...
Analyza is a business intelligence solution that is provided by PIT Business. The vendor provides support services in Luxembourg, Belgium, France, and other parts of Europe. Analyza allows users to: Build several indicators and dashboardsBuild ad-hoc analysis without IT knowledgeNavigate throu...
SAS Data Integration Studio is as the name would suggest a data integration solution, from SAS.
RepreZen™ API Studio is an enterprise-class API design platform, built from the ground up to meet the demands of large-scale integration programs. The vendor says that while other tools only address individual APIs, RepreZen optimizes at the organizational scale, aligning interfaces and streamlin...
BusinessObjects Data Integrator from SAS is a data integration platform.
According to the vendor, Task Factory offers essential, high-performance components and tasks for SQL Server Integration Services (SSIS) that eliminate the need for programming. With over 50 components, Task Factory aims to increase productivity, improve performance and increase ROI. Task Factory...
SAS DataFlux's capabilities handle data profiling, matching, cleansing and monitoring. Capabilities are available as individual products or as a platform. DataFlux competes with Informatica, Trilliium, Ataccama, and SAP Data Quality Management.
Originally developed by Pervasive Software, the Actian DataConnect (formerly Actian Integration Hub) is data integration technology.
Denodo is the eponymous data integration platform from the global company headquartered in Silicon Valley.