Apache Solr - Searching and matching efficiency
September 01, 2016

Apache Solr - Searching and matching efficiency

Philippe Kozak | TrustRadius Reviewer
Score 8 out of 10
Vetted Review
Verified User

Overall Satisfaction with Apache Solr

I worked as a CTO for a pure play company in real estate activity. We had to design and to build five websites for the customers of real estate agencies. We manage about 2 millions classifieds. This area is highly competitive. An the same time, we doubled our unique number of users. So we (the people I managed and myself) decided to use a NoSQL database for our search engine. Our choice went to Apache Solr 4. This DB redesign was done at the same time as a PHP code redesign with Zend Framework. All our five websites were redesigned over a period of 2.5 years. We did a proof of concept with Apache Solr when we needed to redesign our registered customers searches (match 500k searches with 2M classifieds).
  • Faceted navigation and field collapsing/grouping : filtering and quick results were what we needed for our websites. Our customers needed to have this functionalities for good and efficient results.
  • We tested them with our customers' registered searches (they received all new goods matching with their registered searches by emails and/or mobile push). Results were incredible by comparison with our old system (old MySQL requests).
  • Note : we didn't put all our data in Solr. Just what we need for searching uses. Other data stayed in our MySQL database.
  • Auto-suggest : our old auto-suggest wasn't performing well. With Apache Solr, our new one was worked really well ! The suggestions came quickly and suggestions were good.
  • We also extended auto-suggestion with geo-spatial data and it worked well.
  • Hit highlighting : we used this functionality and we didn't have problem and nasty surprise.
  • Keep all data status during data upgrading (see next details for improvements)
  • These examples are due to the way we use Apache Solr. I think we have had the same problems with other NoSQL databases (but perhaps not the same solution). High data volumes of data and a lot of users were the causes.
  • We have lot of classifications and lot of data for each classification. This gave us several problems:
  • First: We couldn't keep all our data in Solr. Then we have all data in our MySQL DB and searching data in Solr. So we need to be sure to update and match the 2 databases in the same time.
  • Second: We needed several load balanced Solr databases.
  • Third: We needed to update all the databases and keep old data status.
  • If I don't speak about problems due to our lack of experience, the main Solr problem came from frequency of updates vs validation of several database. We encountered several locks due to this (our ops team didn't want to use real clustering, so all DB weren't updated). Problem messages were not always clear and we several days to understand the problems.
  • Positive: users had efficient email and push about new goods (B2C), more agency contacts, more business, less agency turnover and more B2B sales
  • Users had a good search experience, more unique users and more agency contacts
Some people on my team tried MondoDB and had several problems (don't remember which ones).

Elasticsearch would be a good choice but we didn't have it in our minds when we made the choice.
It is well suited for classified search and filtering, and high volume data matching.