- Analytical querying due to built in analytical functions that actually perform across TB of data.
- Ingestion of data. We can send billions of rows to Vertica easily via the WOS system and it is ready for use immediately.
- Efficient storage of data. What raw is TB of data, once ingested into Vertica only takes up GB of disk space.
- Management! The management console is intuitive and useful making keeping an eye on your cluster easier than any other product like this I have used.
- Deletion is tough in Vertica. Because one of our larger fact tables is rapidly changing we have a need to run purges on a regular basis. Those purges can take a day and delays the other processes while that is happening. It would be nice if when I hit delete, it really deleted.
- Permissions on table manipulation is a bit lacking. In order to edit a table structure you have to be the owner, ie the creator, of the table. It means setting up true administrators who can maintain each other's work is tough.
- Speed. Even with tables with 20 Billion+ rows, Vertica performs reasonably well.
- Analytical functions. Some of the advanced functions in Vertica enable/facilitate interesting and complex analyses.
- Reliability. We never run into reliability issues with Vertica.
- Data size limitations. Beyond a certain threshold Vertica breaks down. Because of this, we are not able to put all our data in Vertica and have to resort to Scala/Hive on Hadoop.
- Pricing: Vertica can get pretty expensive with large data sizes.
- Speed: Queries could always be faster!
- Limited options for querying clients: We primarily use Vertica from our terminals. Options for GUI clients are ugly and outdated. Using the terminal for querying is sometimes annoying, with problems like showing query runtime only in milliseconds and not being able to change it, columns being hard to read when there are more columns than the display space etc.
- Extremely fast query performance - Vertica is one of the fastest query engines out there.
- Scales to TBs - Scales reasonably well up to 10-20 nodes and 10 - 100s of TB of data.
- Easy to Use - Fairly easy to user, we made quite some headway with just 1 person running it for a while.
- PetaByte Scale data - Vertica Just cannot deal with this, it starts to crumble beyond 100s of TB of data.
- Concurrent Usage - Vertica starts to have significant backpressure as your concurrent users grow quickly. We had trouble scaling post 20-30 users and had to invent our our queuing strategies.
- Vertical stack - storage + compute tier in one stack, this doesn't help the cause of scaling. Other systems leverage the advantage of storage and compute being different tiers (eg: HDFS + Presto)
Scaling for PB data and 1000s of DAU is vertica's weak point. The system is just not designed for large scale usage and still has a long way to go to improve scalability. There are experiments to run Vertica query engine on top of HDFS which seem promising, however - if you have the the Hadoop ecosystem you are better off going the HDFS + Presto/Impala/SparkSQL route. But if you are in the Hadoop ecosystem, you probably are already investing a lot in ops.
- IO optimized - it's a columnar store, no indexing structures to maintain like traditional databases, the indexing is achieved by storing the data sorted on disk, which itself is run transparently as a background process.
- Reduced data storage footprint through advanced encoding schemas (RLE, common-delta, etc.) as well as compression algorithms ability to operate directly on the encoded data.
- Could use some work on better integrating with cloud providers and open source technologies. For AWS you will find an AMI in the marketplace and recently a connector for loading data from S3 directly was created. With last release, integration with Kafka was added that can help.
- Managing large workloads (concurrent queries) is a bit challenging.
- Having a way to provide an estimate on the duration for currently executing queries / etc. can be helpful. Vertica provides some counters for the query execution engine that are helpful but some may find confusing.
- Unloading data over JDBC is very slow. We've had to come up with alternatives based on vsql, etc. Not a very clean, official on how to unload data.
Vertica is not the silver bullet but based on my experience in 9/10 cases in which you need an analytical database, Vertica is probably the answer.
Currently we're using Vertica more as a data processing engine in conjunction with a Hadoop cluster as some of the steps are way more efficient than doing them in Hadoop and easier to manage (e.g. iterative processing steps). We also had a pretty good experience using it with Storm and Hadoop.
Vertica Scorecard Summary
Vertica Technical Details