Apache Lucene is an open source search engine library created in Java, available free under the Apache License 2.0. It is associated to Apache Solr, and includes a number of sub-projects, such as Lucene.NET, Apache Tika, and Apache Nutch, now all top level Apache projects.
Lucene supports multiple query types, including phrase queries, wildcard queries, proximity queries, and range queries, and results are ranked so that best results appear first. It is supported by an online community, is usable where resources are limited and is stated by some users to be fairly performant. It enables search of metadata or of data by any field, is integratable with web crawlers, and available for a variety of use cases. And with the PyLucene Python extension for accessing Java Lucene, users can take advantage of Lucene's text indexing and searching capabilities from Python.
Apache Lucene was first made available in 1999, and became part of the Apache Software Foundations’ projects in 2001. It can be used to implement Internet search engines, local single-site search, or search of private resources, as well as other kinds of tools, such as personalization or recommendation engines.
Its crawling and HTML parsing functionality is supplied by optional, ancillary projects, some of these formerly Lucene sub-projects, such as Nutch, and various databases like CrateDB, and Elasticsearch.
Based on research by Schwarzer et al. (2016), Apache Lucene’s MLT function (“MoreLikeThis”) exceeds at providing optimal search results for locating items or articles that are closely related, yet may create linkages to items that are obscure or not as well-known. While it results may be narrow, the authors state a text-based approach can complement alternate (e.g. citation) search methods.
As an open source project, Apache Lucene is available free to edition 8.8.2. Older editions are available free as well, at the Apache Archives.