Artificial Intelligence 15 min read

Neural Search in Apache Solr: Dense Vector Fields, HNSW Graphs, and K‑Nearest Neighbor Implementation

This article explains how Apache Solr and Lucene implement neural search using dense vector fields, hierarchical navigable small‑world (HNSW) graphs, and approximate K‑nearest neighbor algorithms, covering configuration, custom codecs, indexing formats, and query parsers for vector‑based retrieval.

Architects Research Society
Architects Research Society
Architects Research Society
Neural Search in Apache Solr: Dense Vector Fields, HNSW Graphs, and K‑Nearest Neighbor Implementation

Sease, together with Apache Lucene/Solr PMC members Alessandro Benedetti and Elia Porciani, contributed the first milestone of neural search to the open‑source community, relying on Apache Lucene's K‑Nearest Neighbor (KNN) implementation.

Artificial Intelligence, Deep Learning and Vector Representations

Artificial intelligence (AI) refers to technologies that enable machines to learn and exhibit human‑like intelligence. Recent advances in computing power have revived AI, applying it to fields such as software engineering and information retrieval. Deep learning introduced deep neural networks that can solve complex problems, including generating dense vector representations for queries and documents.

Dense Vector Representations

Traditional inverted indexes model text as sparse vectors where most dimensions are zero. Dense vectors, in contrast, encode semantic meaning into a fixed, low‑dimensional space where most dimensions contain non‑zero values. Models like BERT can encode text into dense vectors for retrieval.

Approximate Nearest Neighbor (ANN)

Computing the exact distance between a query vector and every document vector is expensive; ANN algorithms return results whose distance is at most a factor c of the true nearest neighbor distance, providing near‑exact quality with much lower cost.

Hierarchical Navigable Small‑World (HNSW) Graph

Solr uses an HNSW graph, a proximity‑based structure that connects each vector (vertex) to its nearest neighbors. The graph construction is influenced by hyper‑parameters that control the number of connections per layer and the number of layers.

Apache Lucene Implementation

The current Lucene implementation is single‑layer; a layered version is under development. Key classes include org.apache.lucene.document.KnnVectorField (entry point), org.apache.lucene.codecs.lucene90.Lucene90HnswVectorsFormat (default format), and org.apache.lucene.util.hnsw.HnswGraphBuilder which builds the graph.

Apache Solr Implementation

Available from Solr 9.0 (Q1 2022), the contribution adds a single‑valued dense vector field and KNN query parser. Features include the DenseVectorField type and a KNN query parser.

DenseVectorField Configuration

<fieldType name="knn_vector" class="solr.DenseVectorField" vectorDimension="4" similarityFunction="cosine"/>
<field name="vector" type="knn_vector" indexed="true" stored="true"/>

Supported similarity functions are Euclidean, dot_product (optimized cosine), and cosine. The field supports indexing and storage but not multi‑valued vectors.

Custom Codec Parameters

To use a custom codec and tune HNSW hyper‑parameters, configure solrconfig.xml as follows:

<codecFactory class="solr.SchemaCodecFactory"/>
<fieldType name="knn_vector" class="solr.DenseVectorField" vectorDimension="4" similarityFunction="cosine" codecFormat="Lucene90HnswVectorsFormat" hnswMaxConnections="10" hnswBeamWidth="40"/>
<field name="vector" type="knn_vector" indexed="true" stored="true"/>

Indexing Vectors

Vectors can be indexed via JSON, XML, or SolrJ (Java):

[{ "id": "1", "vector": [1.0, 2.5, 3.7, 4.1] }, { "id": "2", "vector": [1.5, 5.5, 6.7, 65.1] }]
<field name="id">1
<field name="vector">1.0
<field name="vector">2.5
<field name="vector">3.7
<field name="vector">4.1
final SolrClient client = getSolrClient();
final SolrInputDocument d1 = new SolrInputDocument();
d1.setField("id", "1");
d1.setField("vector", Arrays.asList(1.0f, 2.5f, 3.7f, 4.1f));
final SolrInputDocument d2 = new SolrInputDocument();
d2.setField("id", "2");
d2.setField("vector", Arrays.asList(1.5f, 5.5f, 6.7f, 65.1f));
client.add(Arrays.asList(d1, d2));

KNN Query Parser

The KNN parser finds the k nearest documents to a target vector in a specified DenseVectorField . Parameters include f (field) and topK (default 10).

&q={!knn f=vector topK=10}[1.0, 2.0, 3.0, 4.0]

The parser can be combined with filter queries, used for re‑ranking, and must be understood in the context of Solr's ranking pipeline.

Important Notes

When using custom codecs, future Solr upgrades may require reverting to the default codec or re‑indexing. HNSW hyper‑parameters are described in the referenced literature.

For further details, refer to the Apache Solr Wiki and the cited papers.

HNSWkNNVector RetrievalApache SolrDense Vectorsneural search
Architects Research Society
Written by

Architects Research Society

A daily treasure trove for architects, expanding your view and depth. We share enterprise, business, application, data, technology, and security architecture, discuss frameworks, planning, governance, standards, and implementation, and explore emerging styles such as microservices, event‑driven, micro‑frontend, big data, data warehousing, IoT, and AI architecture.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.