Data Preparation Tools

Data Preparation Tools Overview

What are Data Preparation Tools?

Data preparation tools are a new class of software products designed to enable business analysts and data scientists to bypass data warehouses to perform some data integration and data preparation themselves before analysis. Data preparation tools handle as much of the data “cleaning” process as possible. Data prep features are often found within larger tools, such as data analytics platforms, BI tools, integration platforms, and broader machine learning platforms.

Data preparation tools can search for and access data throughout an organization, combine it with other external data sets, and do data cleansing and conversions as required before feeding the data back into business intelligence systems for analysis.These emerging tools use machine learning under the hood so that they can iterate and learn where to find insights in data sets, without being explicitly programmed to do so.

Self-service Data Preparation

A big role of data preparation tools is to get data into an analysis-ready state for end users with minimal, or no, data science knowledge. Historically, data preparation has required IT or data science resources for any sort of scaled preparation. Data preparation tools aim to democratize this process by making data preparation accessible for a wider range of users, from IT specialists to data analysts to line-of-business users.

Data preparation tools use several different features and capabilities to enable business-wide self-service. The most important features that virtually all modern data preparation tools include are:

  • Visual interfaces

  • Integration with all sources of data within the business

  • Machine learning for automated insights and recommended preparation steps

  • Data governance for repeatability and tracking

Data Preparation Tools Comparison

Data preparation tools can be challenging to compare. When evaluating different options, consider these factors:

  • Visual Interface: Visual interfaces have become the norm for data preparation tools. Buyers should try to work with each interface to get a better sense of how easy to use each one is, especially for the sophistication level of the expected user base (i.e. data scientists vs. non-specialized users). The quality and usability of interfaces are also often a point of note within data preparation reviews.

  • Tech Stack Integrations: How well does each tool integrate with the existing data sources the organization has? Data prep tools should make data accessibility easy for end-users, but if the tool does not cleanly interface with each data source, users will continue to struggle to centralize data for cleaning, and may even resort to manual processes.

  • Machine learning capabilities: Most data preparation tools advertise some element of machine learning or AI assistance. However, not all smart tech is created equal. Followup with each vendor on just what this technology can do for users, especially assisting less data-savvy users working within the data preparation tool.

Start a data preparation tool comparison here.

Pricing Information

Pricing will vary primarily depending on whether the product is a standalone data prep tool or a larger integration or analytics solution. Leaders in the space will charge between $100-450/user/month. There are some free open source options as well.

Data Preparation Products

(1-25 of 44) Sorted by Most Reviews

IBM Cognos Analytics
358 ratings
67 reviews
IBM Cognos is a full-featured business intelligence suite by IBM, designed for larger deployments. It comprises Query Studio, Reporting Studio, Analysis Studio and Event Studio, and Cognos Administration along with tools for Microsoft Office integration, full-text search, and dashboards.
121 ratings
46 reviews
Top Rated
Alteryx aims to be the launchpad for automation breakthroughs. Be it for personal growth, achieving transformative digital outcomes, or rapid innovation, the vendor boasts users will see unparalleled results. Alteryx converges analytics, data science and process automation into one platform, to enab…
JMP Statistical Discovery Software from SAS
50 ratings
26 reviews
JMP is a division of SAS and the JMP family of products provide statistical discovery tools linked to dynamic data visualizations.
Datawatch Monarch
11 ratings
6 reviews
Datawatch Monarch works with both relational and multi-structured data including support for a wide range of formats including PDF, XML, HTML, text, spool and ASCII files. The product can access data from invoices, sales reports, balance sheets, customer lists, inventory, logs and more. According …
Databricks Unified Analytics Platform
19 ratings
5 reviews
Databricks in San Francisco offers the Databricks Unified Analytics Platform, a data science platform and Apache Spark cluster manager. The Databricks Unified Data Service aims to provide a reliable and scalable platform for data pipelines, data lakes, and data platforms. Users can manage full data …
Tableau Prep
7 ratings
4 reviews
Tableau Prep enables users to get to the analysis phase faster by helping them quickly combine, shape, and clean their data. According to the vendor, a direct and visual experience helps provide users with a deeper understanding of their data, smart features make data preparation simple, and integ…
2 ratings
2 reviews
ReportMiner provides an automated solution for data ingestion and integration for unstructured document data sources. This data extraction software enables you to liberate business data trapped in documents such as PDFs, PDF forms, PRN, TXT, RTF, DOC,DOCX, XLS, and XLSX. With features for data scrub…
3 ratings
2 reviews
Trifacta is a "data wrangling" (or data preparation) platform particularly of use with Hadoop, from the company Trifacta headquartered in San Francisco, California. NL Suite (formerly Cogito Intelligence Platform from Expert System)
0 ratings
1 review
The NL Suite (formerly Cogito Intelligence Platform (CIP) from Expert System, rebranded performs analysis of unstructured data sets to organize, discover and explore information in order to support intelligence workflows by providing actionable insight as data emerges. CIP lever…
AWS Glue
5 ratings
1 review
AWS Glue is a managed extract, transform, and load (ETL) service designed to make it easy for customers to prepare and load data for analytics. With it, users can create and run an ETL job in the AWS Management Console. Users point AWS Glue to data stored on AWS, and AWS Glue discovers data and stor… Discover (formerly Cogito Discover from Expert System) Discover (formerly Cogito Discover from Expert System, rebranded, provides comprehension of written text at scale to ensure that the insight and knowledge embedded in unstructured information is available for strategic and operational analysis and easily usable by process automa…
Openprise is a data automation solution designed to help business users manage and share big data and create do-it-yourself automation directly from those data. The vendor says Openprise requires no software to deploy and no coding needed; it is designed so that users who are familiar with spreadshe…
Tamr is the eponymous data integration platform from the Cambridge, Massachusetts based company.
Keboola Connection
Keboola provides an open and extensible cloud based data integration platform that enables clients to combine, enhance and publish data for their internal analytics projects and data products. Keboola aims to help companies of all sizes: Reduce time to launch for analytics projectsEnable collaborat…
Astronomer captures every user event and routes them anywhere. Automatically. According to the vendor, with a single snippet of code, any team across an organization can connect to the tools they love. A simple user interface allows you to toggle your favorites on and off and experiment with new on…
Upsolver is an In-Memory Data Preparation Platform that aims to remove the complexity from Big Data and Real-Time projects, and shorten their implementation time from weeks/months to several hours. Powered by a cutting edge VolcanoTM technology, it queries an entire data lake in less than a millise…
Inzata Analytics
Inzata Analytics promises to be a full-service data analytics platform for integrating, enriching, and exploring data of any kind, from any source, at massive scale. Its AI-Powered data modeling and patented analytics engine aim to help users load, blend, and model raw and unstructured data at grea…
Lore IO
Lore IO is a collaborative data unification platform that promises to help companies ingest and unify disparate data sets from hundreds or thousands of sources. It generates standard outputs without the need for engineers to develop procedural ETL and data pipelines. The vendor aims to empower busin…
Mu Sigma muRx
Mu Sigma headquartered in Northbrook offers muRx, a problem space modeling, analytics planning and data preparation tool designed to support business decisioning and BI processes.
Netlink Dataware
Netlink headquartered in Wisconsin offers Dataware, a platform for extracting and preparing data, performing data cleansing, data mapping, data conversion, and combining of data.
Hive Data
Hive Data is a fully managed, distributed data labeling platform optimized to create training data to build computer vision models. It was built specifically for humans to label unstructured visual and audio data. Data labeling allows Hive to produce highly accurate and customizable proprietary data…
Palantir Foundry
Palantir Technologies headquartered in Palo Alto offers Palantir Foundry, a data governance tool which checks data quality, ensures a common ontology for all enterprise data, maintains consistency between data and business logic, and performs other tasks to maximize the worth of enterprise data for …
Alegion headquartered in Austin offers their data labeling and and annotation platform, designed to deliver production-grade data volume and quality. Advanced machine learning capabilities like conditional logic, multi-stage workflows, and quality control routing accelerate data annotation, reservin…

Frequently Asked Questions

What do data preparation tools do?

Data preparation tools help streamline and automate the process of extracting, compiling, and “cleaning” data so it can be easily analyzed and reported on.

Who uses data preparation tools?

Data preparation tools are primarily used by data analysts and similar roles, but many tools are becoming more accessible for line-of-business users as well.

What other tools have data preparation features?

Data preparation can also be found in many analytics platforms, BI tools, and integration platforms.

What are the benefits of data preparation tools?

Data preparation tools can save analysts massive amounts of manual time and labor and also mitigate the risk of human error in the preparation process.

How much do data preparation tools cost?

Leading data preparation tools can range from $100-500/month per seat, depending on the number of users and range of features included.