Apache Solr is an open-source enterprise search server.
N/A
IBM Watson Discovery
Score 9.3 out of 10
N/A
IBM offers Watson Discovery, a natural language processing (NLP) application with options to measure sentiment, detect entities, semantic roles, and other concepts.
Solr spins up nicely and works effectively for small enterprise environments providing helpful mechanisms for fuzzy searches and facetted searching. For larger enterprises with complex business solutions you'll find the need to hire an expert Solr engineer to optimize the powerful platform to your needs. Internationalization is tricky with Solr and many hosting solutions may limit you to a latin character set.
Overall, IBM Watson Discovery is an amazing technology that we use with our clients to address various business problems, but the biggest challenge has always been about ingesting, analyzing, enriching, and searching huge collections of documents and allowing our end users and SMEs to be able to search for what they need to reduce the time and efforts spent daily on a manual search through various collections of documents. We have successfully managed to reduce manual work by over 80%, and now our SMEs are being used for the skills they have to gather insights rather than do manual work.
Easy to get started with Apache Solr. Whether it is tackling a setup issue or trying to learn some of the more advanced features, there are plenty of resources to help you out and get you going.
Performance. Apache Solr allows for a lot of custom tuning (if needed) and provides great out of the box performance for searching on large data sets.
Maintenance. After setting up Solr in a production environment there are plenty of tools provided to help you maintain and update your application. Apache Solr comes with great fault tolerance built in and has proven to be very reliable.
These examples are due to the way we use Apache Solr. I think we have had the same problems with other NoSQL databases (but perhaps not the same solution). High data volumes of data and a lot of users were the causes.
We have lot of classifications and lot of data for each classification. This gave us several problems:
First: We couldn't keep all our data in Solr. Then we have all data in our MySQL DB and searching data in Solr. So we need to be sure to update and match the 2 databases in the same time.
Second: We needed several load balanced Solr databases.
Third: We needed to update all the databases and keep old data status.
If I don't speak about problems due to our lack of experience, the main Solr problem came from frequency of updates vs validation of several database. We encountered several locks due to this (our ops team didn't want to use real clustering, so all DB weren't updated). Problem messages were not always clear and we several days to understand the problems.
Discovery does not have an end-user interface which makes it a little cumbersome for the non-technical audience. One needs to figure out how to leverage open-source technology to create a simple UI to interface with Discovery, but I believe IBM Dev teams ought to create an intuitive UI similar to that of Watson Assistant and make it easier for both technical and non-technical audiences to use Discovery with ease.
Another issue we faced with Discovery was when we tried to ingest very technical documents with less text and more engineering diagrams and ended up with a bit of a mess because Discovery did not manage to parse them correctly, but that is being fixed by IBM Dev team as far as I know. IBM is enhancing its OCR to allow for better recognition and understanding of advanced engineering graphs and diagrams.
After having tested the newly released Data Miner on IBM Watson Discovery, my impression is that it is targeting a niche audience of SMEs and technical people who understand what to mine for and comprehend the overly complicated graphs generated. I believe IBM should work on simplifying the UI as well as the way graphs are being displayed to allow business users to leverage this awesome feature.
Similar to all IBM Watson and Salesforce product solutions, the overall support would be a 10/10. Their provided FAQ's help with frequently experienced issues and if still unable to figure something out, their customer service representatives are always super responsive. With instant chat functions available, it is easy to ask a quick question rather than sitting on hold.
Discovery differs from its competitors due to the better ease of implementation and the high level of natural language recognition, it is equal in integration resources such as API and workflow or process pipeline, but it loses in the price for a high volume of documents and/or research. If you own or plan to use other services from the IBM Watson family, there is no doubt that Watson discovery is your best option. Another important point is if you plan to use a cloud or on-premise service (local server or private cloud).