Item: OpenText Vertica
Rating: 9
Author: Traian Antonescu

Use Cases and Deployment Scope

Vertica is our main data warehouse. Is used as a source for most of our analytic reports as well as for all data analysis activities. We also use it in a non-traditional fashion, more like a data processing engine for solving problems at scale (matching, statistics, correlate sources, etc.). It runs in AWS with data loaded/unloaded from/to S3.

Pros and Cons

IO optimized - it's a columnar store, no indexing structures to maintain like traditional databases, the indexing is achieved by storing the data sorted on disk, which itself is run transparently as a background process.
Reduced data storage footprint through advanced encoding schemas (RLE, common-delta, etc.) as well as compression algorithms ability to operate directly on the encoded data.

Could use some work on better integrating with cloud providers and open source technologies. For AWS you will find an AMI in the marketplace and recently a connector for loading data from S3 directly was created. With last release, integration with Kafka was added that can help.
Managing large workloads (concurrent queries) is a bit challenging.
Having a way to provide an estimate on the duration for currently executing queries / etc. can be helpful. Vertica provides some counters for the query execution engine that are helpful but some may find confusing.
Unloading data over JDBC is very slow. We've had to come up with alternatives based on vsql, etc. Not a very clean, official on how to unload data.

Return on Investment

Vertica increased our productivity in analyzing the data and validating simple proof of concepts with our data.
Results of analytical queries produced from Vertica are used by all departments as well as part of some of our products.

Alternatives Considered

EMC Greenplum HD, Amazon Redshift and IBM Netezza Data Warehouse Appliances

Vertica is much easier to manage; is just software (i.e. vs. Netezza), easier to scale and extend, with a very powerful query execution engine and storage layer. While other solutions (e.g. Greenplum) are just postgres clones that were extended to run at scale but still keep their traditional database features (e.g. indexes, materialized views, etc), Vertica has been built from scratch with performance in mind. Five years ago Vertica's storage layer was pretty advanced with very few contenders. Currently more are copying it and lately you can find features like RLE (run length encoding), etc., even in open source columnar formats like Parquet, ORC. So in order to keep up, Vertica has been extended with Hadoop, Kafka, unstructured data (FlexTables) support, etc.

Other Software Used

Amazon Elastic MapReduce, Elasticsearch, Apache Spark

Likelihood to Recommend

Vertica is not the silver bullet but based on my experience in 9/10 cases in which you need an analytical database, Vertica is probably the answer.

Currently we're using Vertica more as a data processing engine in conjunction with a Hadoop cluster as some of the steps are way more efficient than doing them in Hadoop and easier to manage (e.g. iterative processing steps). We also had a pretty good experience using it with Storm and Hadoop.

At the same time, using Vertica as a traditional OLTP database, with many small transactions inserting/deleting/updating data is not going to take you very far so that’s an obvious case where Vertica is not recommended.

Fast and powerful analytics platform

Overall Satisfaction with Vertica