Artificial Intelligence 15 min read

Neural Search in Apache Solr: Dense Vector Fields, HNSW Graphs, and K‑Nearest Neighbor Implementation

This article explains how Apache Solr and Lucene implement neural search using dense vector fields, hierarchical navigable small‑world (HNSW) graphs, and approximate K‑nearest neighbor algorithms, covering configuration, custom codecs, indexing formats, and query parsers for vector‑based retrieval.

Architects Research Society

Jun 6, 2022

Neural Search in Apache Solr: Dense Vector Fields, HNSW Graphs, and K‑Nearest Neighbor Implementation

Sease, together with Apache Lucene/Solr PMC members Alessandro Benedetti and Elia Porciani, contributed the first milestone of neural search to the open‑source community, relying on Apache Lucene's K‑Nearest Neighbor (KNN) implementation.

Artificial Intelligence, Deep Learning and Vector Representations

Artificial intelligence (AI) refers to technologies that enable machines to learn and exhibit human‑like intelligence. Recent advances in computing power have revived AI, applying it to fields such as software engineering and information retrieval. Deep learning introduced deep neural networks that can solve complex problems, including generating dense vector representations for queries and documents.

Dense Vector Representations

Traditional inverted indexes model text as sparse vectors where most dimensions are zero. Dense vectors, in contrast, encode semantic meaning into a fixed, low‑dimensional space where most dimensions contain non‑zero values. Models like BERT can encode text into dense vectors for retrieval.

Approximate Nearest Neighbor (ANN)

Computing the exact distance between a query vector and every document vector is expensive; ANN algorithms return results whose distance is at most a factor c of the true nearest neighbor distance, providing near‑exact quality with much lower cost.

Hierarchical Navigable Small‑World (HNSW) Graph

Solr uses an HNSW graph, a proximity‑based structure that connects each vector (vertex) to its nearest neighbors. The graph construction is influenced by hyper‑parameters that control the number of connections per layer and the number of layers.

Apache Lucene Implementation

The current Lucene implementation is single‑layer; a layered version is under development. Key classes include org.apache.lucene.document.KnnVectorField (entry point), org.apache.lucene.codecs.lucene90.Lucene90HnswVectorsFormat (default format), and org.apache.lucene.util.hnsw.HnswGraphBuilder which builds the graph.

Apache Solr Implementation

Available from Solr 9.0 (Q1 2022), the contribution adds a single‑valued dense vector field and KNN query parser. Features include the DenseVectorField type and a KNN query parser.

DenseVectorField Configuration

<fieldType name="knn_vector" class="solr.DenseVectorField" vectorDimension="4" similarityFunction="cosine"/>
<field name="vector" type="knn_vector" indexed="true" stored="true"/>

Supported similarity functions are Euclidean, dot_product (optimized cosine), and cosine. The field supports indexing and storage but not multi‑valued vectors.

Custom Codec Parameters

To use a custom codec and tune HNSW hyper‑parameters, configure solrconfig.xml as follows:

<codecFactory class="solr.SchemaCodecFactory"/>
<fieldType name="knn_vector" class="solr.DenseVectorField" vectorDimension="4" similarityFunction="cosine" codecFormat="Lucene90HnswVectorsFormat" hnswMaxConnections="10" hnswBeamWidth="40"/>
<field name="vector" type="knn_vector" indexed="true" stored="true"/>

Indexing Vectors

Vectors can be indexed via JSON, XML, or SolrJ (Java):

[{ "id": "1", "vector": [1.0, 2.5, 3.7, 4.1] }, { "id": "2", "vector": [1.5, 5.5, 6.7, 65.1] }]

<field name="id">1
<field name="vector">1.0
<field name="vector">2.5
<field name="vector">3.7
<field name="vector">4.1

final SolrClient client = getSolrClient();
final SolrInputDocument d1 = new SolrInputDocument();
d1.setField("id", "1");
d1.setField("vector", Arrays.asList(1.0f, 2.5f, 3.7f, 4.1f));
final SolrInputDocument d2 = new SolrInputDocument();
d2.setField("id", "2");
d2.setField("vector", Arrays.asList(1.5f, 5.5f, 6.7f, 65.1f));
client.add(Arrays.asList(d1, d2));

KNN Query Parser

The KNN parser finds the k nearest documents to a target vector in a specified DenseVectorField. Parameters include f (field) and topK (default 10).

&q={!knn f=vector topK=10}[1.0, 2.0, 3.0, 4.0]

The parser can be combined with filter queries, used for re‑ranking, and must be understood in the context of Solr's ranking pipeline.

Important Notes

When using custom codecs, future Solr upgrades may require reverting to the default codec or re‑indexing. HNSW hyper‑parameters are described in the referenced literature.

For further details, refer to the Apache Solr Wiki and the cited papers.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

HNSW kNN Vector Retrieval Apache Solr Dense Vectors Neural Search

Written by

Architects Research Society

A daily treasure trove for architects, expanding your view and depth. We share enterprise, business, application, data, technology, and security architecture, discuss frameworks, planning, governance, standards, and implementation, and explore emerging styles such as microservices, event‑driven, micro‑frontend, big data, data warehousing, IoT, and AI architecture.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.