Data Deduplication Tools

Data Deduplication Tools Overview

What are Data Deduplication Tools?

Data deduplication tools are important for backup and restore operations where large quantities of data are backed up at regular intervals. Frequent backup always means copying and storing large data sets for recovery purposes. As much of this data is duplicate data, storing it all repeatedly would quickly lead to unmanageably large data storage requirements. It is essential to deduplicate these data streams to optimize data backup storage.

Deduplication is achieved by means of a deduplication algorithm which is capable of examining an incoming data stream, and comparing data segments to data that has been stored previously. However, there are several things to consider when looking for a product as not all deduplication products work in the same way:

  • Source versus target deduplication: Software running on a server which is the source of data is deduped before it is transmitted to the storage device. The advantages of this approach are that a smaller quantity of data is transmitted to the target storage solution and this method therefore uses less bandwidth for data transmission Source deduplication can increase processing time, which is often an important consideration in virtualized environments where there is a very large quantity of data duplication. The alternative is target deduplication where the data is all transmitted to a storage NAS device or tape library, and is deduped once it has been sent. This method reduces the storage capacity required for backup data, but does not reduce the amount of data sent across a LAN or WAN during backup.
  • Inline deduplication versus post-processing deduplication: Inline processing means that the deduplication process happens in real time as the data is being transmitted to storage. In post-processing deduplication, the backup data is all written to a disk cache before it starts the deduplication process.
  • Global deduplication: Global deduplication is an important consideration as most deduplication processes are designed to remove duplicated data from a single storage device. Global deduplication is removing redundant data across the entire data storage infrastructure. Global deduplication allows administrators to efficiently manage the entire backup data storage environment.
The benefits of data deduplication are primarily in reducing data storage requirements and hence costs. Deduplication also makes data restore operations more efficient since there is much less data to restore.

Data Deduplication Products

Falconstor Virtual Tape Library (VTL) is the eponymous data center backup and recovery solution from the company headquartered in Melville, New York.

Clear Analytics is a business intelligence solution that enables non technical end users to perform analytics by leveraging existing knowledge of Excel coupled with a built in query builder. Some key features include: Dynamic Data Refresh, Data Share and In-Excel Collaboration.

StarDQ is a real time solution for cleansing, de-duping, and enriching enterprise data. By integrating StarDQ Solution, organizations can cleanse, match and unify data across multiple data sources and data domains. According to the vendor, the goal is to ensure that data is a strategic, trustwort...

StorReduce is a software defined, scale-out deduplication service in the cloud. The product reduces object storage costs, speeds up transfer, and enables data reuse.