What are Data Preparation Tools?
Data preparation tools are a new class of software products designed to enable business analysts and data scientists to bypass data warehouses to perform some data integration and data preparation themselves before analysis. Data preparation tools handle as much of the data “cleaning” process as possible. Data prep features are often found within larger tools, such as data analytics platforms, BI tools, integration platforms, and broader machine learning platforms.
Data preparation tools can search for and access data throughout an organization, combine it with other external data sets, and do data cleansing and conversions as required before feeding the data back into business intelligence systems for analysis.These emerging tools use machine learning under the hood so that they can iterate and learn where to find insights in data sets, without being explicitly programmed to do so.
Self-service Data Preparation
A big role of data preparation tools is to get data into an analysis-ready state for end users with minimal, or no, data science knowledge. Historically, data preparation has required IT or data science resources for any sort of scaled preparation. Data preparation tools aim to democratize this process by making data preparation accessible for a wider range of users, from IT specialists to data analysts to line-of-business users.
Data preparation tools use several different features and capabilities to enable business-wide self-service. The most important features that virtually all modern data preparation tools include are:
Integration with all sources of data within the business
Machine learning for automated insights and recommended preparation steps
Data governance for repeatability and tracking
Data Preparation Tools Comparison
Data preparation tools can be challenging to compare. When evaluating different options, consider these factors:
Visual Interface: Visual interfaces have become the norm for data preparation tools. Buyers should try to work with each interface to get a better sense of how easy to use each one is, especially for the sophistication level of the expected user base (i.e. data scientists vs. non-specialized users). The quality and usability of interfaces are also often a point of note within data preparation reviews.
Tech Stack Integrations: How well does each tool integrate with the existing data sources the organization has? Data prep tools should make data accessibility easy for end-users, but if the tool does not cleanly interface with each data source, users will continue to struggle to centralize data for cleaning, and may even resort to manual processes.
Machine learning capabilities: Most data preparation tools advertise some element of machine learning or AI assistance. However, not all smart tech is created equal. Followup with each vendor on just what this technology can do for users, especially assisting less data-savvy users working within the data preparation tool.
Pricing will vary primarily depending on whether the product is a standalone data prep tool or a larger integration or analytics solution. Leaders in the space will charge between $100-450/user/month. There are some free open source options as well.