Apache Solr and Elasticsearch are both open-source enterprise search software solutions that allow users to search and retrieve data within an organization. Both software options integrate with tools like databases or intranets where information can be collected or displayed. Businesses of all sizes use both Apache Solr and Elasticsearch.
Apache Solr and Elasticsearch both provide essential enterprise search features, including data retrieval and display. Despite this, both software options have a few standout features that set them apart from each other.
Apache Solr offers robust text search features that allow users to search for materials by their content. Apache Solr has many contributors to its open-source code. Developers and code committers for Apache Solr are selected from that community of contributors. This approach to development means bugfixes and updates are frequent, and features can be developed quickly. Lastly, Apache Solr provides detailed documentation for developers, including multiple examples.
Elasticsearch is lightweight to the extent that a business can install and run the Elasticsearch in a matter of minutes. Similarly, Elasticsearch configuration is based on JSON, which makes file configuration simple, if a little inflexible in terms of documentation. JSON compatibility also makes Elasticsearch a great choice when working with JSON applications. Elasticsearch focuses on complex querying and filtering, though it also offers basic text search. Lastly, Elasticsearch is designed for the cloud and supports clustering, leading to a highly scalable option.
Though Apache Solr and Elasticsearch have robust sets of features, they both have a few limitations that are important to consider.
Apache Solr offers text search features but is limited when it comes to more complex querying and filtering. Lack of complex querying can make Apache Solr a poor choice for applications that need non-text search features. Additionally, Apache Solr is a heavier software option compared to Elasticsearch, which can make installation more challenging for lightweight applications.
Elasticsearch is open-source in that all users have access to the source code. However, unlike many open-source technologies, all changes to the code must be approved by Elastic developers. As a result, Elasticsearch provides the financial benefits of open-source software but doesn’t offer the same level of community development as Apache Solr. Additionally, though Elastisearch provides complex search features, its text search features are more limited compared to Apache Solr.
Apache Solr and Elasticsearch are both open-source technologies, meaning their source code is available for free. Despite this, both software options also have vendors that provide cloud hosting services. Pricing for Apache Solr and Elasticsearch is dependent on factors such as the vendor, support needs, and amount of indexed nodes. Apache Solr pricing usually starts around $10.00 per month, while Elasticsearch starts around $16.00 per month.
Provided by the TrustRadius Research Team
Published on April 24, 2020
Likelihood to Recommend
- Easy to get started with Apache Solr. Whether it is tackling a setup issue or trying to learn some of the more advanced features, there are plenty of resources to help you out and get you going.
- Performance. Apache Solr allows for a lot of custom tuning (if needed) and provides great out of the box performance for searching on large data sets.
- Maintenance. After setting up Solr in a production environment there are plenty of tools provided to help you maintain and update your application. Apache Solr comes with great fault tolerance built in and has proven to be very reliable.
- Super-fast search on millions of documents. We've got over 2 billion documents in our index and the retrieve speeds are still in the < 1-second range.
- Analytics on top of your search. If you organize your data appropriately, Elasticsearch can serve as a distributed OLAP system
- Elasticsearch is great for geographic data as well, including searching and filtering with geojson, and a variety of geospatial algorithms.
- These examples are due to the way we use Apache Solr. I think we have had the same problems with other NoSQL databases (but perhaps not the same solution). High data volumes of data and a lot of users were the causes.
- We have lot of classifications and lot of data for each classification. This gave us several problems:
- First: We couldn't keep all our data in Solr. Then we have all data in our MySQL DB and searching data in Solr. So we need to be sure to update and match the 2 databases in the same time.
- Second: We needed several load balanced Solr databases.
- Third: We needed to update all the databases and keep old data status.
- If I don't speak about problems due to our lack of experience, the main Solr problem came from frequency of updates vs validation of several database. We encountered several locks due to this (our ops team didn't want to use real clustering, so all DB weren't updated). Problem messages were not always clear and we several days to understand the problems.
- Setting Java memory thresholds can be a pain for those not accustomed to things like Eden Space & Old Generation which can lead to over allocation, or more likely, under allocation. Apache Solr had a similar issue. It would be nice if the program would take an extra step and dogfood it's own advice by analyzing the system & processes to return a solid recommendation for that configuration. The proper configuration information is outlined in the documentation, it would be nice if that was automated.
- The only health check that ElasticSearch reports back is a "red" status without any real solid information about what is going on, though its usually memory thresholds or disk I/O. I am currently on ElasticSearch 1.5 so that may have changed for newer versions. When the status goes "red", I as the administrator of the software, feel like I lose control of whats going on which should rarely happen. Something more verbose would eliminate that.
- This is more of a critique of the ElasticStack in general. The whole top to bottom stack is starting to get feature creep with things that are better suited in other software and increasing the barrier for entry for people to get started with setting up a robust logging infrastructure. ElasticSearch as a storage search engine, is pretty streamlined, but I can see that the tools that comprise the ELK Stack are going to require a certification with constant study at some point. During major release for Logstash a while back, it literally took a month to learn a new language because Elastic completely changed the syntax. For a medium sized organization of only a couple of admins, that is a pretty high bar where time is money. They really should work on refining/automating the tools & search engine they have, instead of shoehorning/changing things on to an already rock solid foundation.
Likelihood to Renew
Return on Investment
- It's enabled us to deliver fast, relevant search results on our new website. The site is still in beta and being actively developed so our complete ROI is still unknown.
- It integrates very well with Drupal so it has saved us from having to develop a custom solution.
- Faster searches on our application have resulted in better usability and increased application use
- Analytics dashboard has given our managers a better understanding of day-to-day activities
- Being a backup data store, we need not touch SQL database while doing data dumps for local data science projects
Premium Consulting/Integration Services—
Entry-level set up fee?
Apache Solr Editions & Modules
Additional Pricing Details—
Premium Consulting/Integration Services—
Entry-level set up fee?
Elasticsearch Editions & Modules
- per month