Item: Elasticsearch
Rating: 10
Author: Colby Shores

Overall Satisfaction with Elasticsearch

Use Cases and Deployment Scope

We use Elasticsearch as the storage/search component of our logging infrastructure (ElasticStack). Once we have broken apart the individual variable components of each log as their own variable type using Logstash, we store those records in to Elasticsearch. Kibana queries Elasticsearch to display the resulting data. We also utilize Elasticsearch to display the cluster status for each of our markets across our entire web cluster using an internal reporting tool we wrote.

Pros and Cons

Pros

Effortless to set up. Literally set the memory thresholds for Java and start throwing JSON formatted records in to the database, it "Just Works". Even clustering is automated as the cluster finds other ElasticSearch servers on the network and assigns each a name.
Very simple to use interface either through it's RESTFUL API (ala Curl) or via its speedy protocol on port 9300. Once records are added, the very easy to use Apache Lucene syntax is supported to extract data.
It's search capabilities are fast on huge datasets, even on very modest hardware. Our organization operates in the hundreds of servers taking thousands of requests a second, each with it's own log w/ a 2 week retention. The ElasticSearch server we recently decommissioned was Pentium 4 Netburst class Xeon, it rarely skipped a beat.

Cons

Setting Java memory thresholds can be a pain for those not accustomed to things like Eden Space & Old Generation which can lead to over allocation, or more likely, under allocation. Apache Solr had a similar issue. It would be nice if the program would take an extra step and dogfood it's own advice by analyzing the system & processes to return a solid recommendation for that configuration. The proper configuration information is outlined in the documentation, it would be nice if that was automated.
The only health check that ElasticSearch reports back is a "red" status without any real solid information about what is going on, though its usually memory thresholds or disk I/O. I am currently on ElasticSearch 1.5 so that may have changed for newer versions. When the status goes "red", I as the administrator of the software, feel like I lose control of whats going on which should rarely happen. Something more verbose would eliminate that.
This is more of a critique of the ElasticStack in general. The whole top to bottom stack is starting to get feature creep with things that are better suited in other software and increasing the barrier for entry for people to get started with setting up a robust logging infrastructure. ElasticSearch as a storage search engine, is pretty streamlined, but I can see that the tools that comprise the ELK Stack are going to require a certification with constant study at some point. During major release for Logstash a while back, it literally took a month to learn a new language because Elastic completely changed the syntax. For a medium sized organization of only a couple of admins, that is a pretty high bar where time is money. They really should work on refining/automating the tools & search engine they have, instead of shoehorning/changing things on to an already rock solid foundation.

Return on Investment

When we where initially exploring logging solutions, Splunk was the only vendor in town and they where extremely expensive ($60,000). We haven't revisited them since as ElasticSearch has accomplished all of our needs.
We haven't spent anything but Admin hours to maintain our ElasticSearch cluster. Right now we haven't incurred any cost of ownership as I have been maintaining the cluster myself.
We have a huge project to grow a new part of our business, but I am not sure if I can spend the time to really update cluster to support the new Logstash features & any syntax changes so I am reluctant to do so. Time is increasingly becoming scarce, so catering to the latest and greatest features that offer little to our organization isn't something we are interested in pursuing though we are going to need to update the ElasticStack eventually.
Since all of our metrics are in ElasticSearch, we have had nice trove of data to build our apps around, apps that require specific metrics. Prior to ElasticSearch, we had to build our own tools that handled that metric collection. The cost savings here is that we maintain a simple script that reports back information in our reporting interface vs rolling our own database metric solution that must be modified for every app we develop. That has equated to a huge saving in developer hours in our organization.

Alternatives Considered

Apache Solr

Apache Solr is the closest competitor to ElasticSearch from a search engine perspective. ElasticSearch is simple and streamlined in it's configuration. When taken as a whole, Apache Solr is more robust as a storage engine from a developer perspective, ElasticSearch has the entire ElasticStack at it's disposal which sets it apart. Our organization looked into Splunk, however I wasn't with the organization at that time to give a solid perspective on it.

Other Software Used

Apache Solr, VMware ESXi, Apache Web Server

Likelihood to Recommend

ElasticSearch is hands down, the absolute best solution for logging in a virtualization environment. The Kibana front end to ElasticSearch is extremely intuitive, even computer novices can be trained on how to chain together tags in the Apache Lucene syntax to extract the data they need. Once the deploy process is nailed down and system is engineered, the logging structure can remain fairly static until the next major revision. Compared to Splunk, with an administrator well versed in the ElasticSearch suite, will save an organization upwards of 10's of thousands of dollars a year even with the caveats mentioned earlier.

As a developer looking for a quick and simple search engine which has little configuration required, ElasticSearch is fast and perfect for that solution. Literally throw JSON records in to the database and push a request to get JSON out, exceptionally straightforward.

Comments

Please log in to join the conversation

ElasticSearch is a simple straightforward search engine that literally anyone can get started with!