Artificial Intelligence 8 min read

Neighbor Transformer (NFormer): Robust Person Re-identification via Interactive Multi‑image Modeling

Neighbor Transformer (NFormer) introduces interactive multi‑image modeling for person re‑identification, using Landmark Agent Attention and Reciprocal Neighbor Softmax to efficiently fuse features across images, achieving state‑of‑the‑art accuracy and tighter embedding clusters on multiple benchmark datasets.

Xiaohongshu Tech REDtech
Xiaohongshu Tech REDtech
Xiaohongshu Tech REDtech
Neighbor Transformer (NFormer): Robust Person Re-identification via Interactive Multi‑image Modeling

At CVPR 2022, the Xiaohongshu multimodal algorithm team introduced Neighbor Transformer (NFormer), a novel network for person re-identification that models multiple input images jointly with a transformer, rather than processing a single image independently. This interactive modeling yields more robust feature representations.

NFormer incorporates two key modules—Landmark Agent Attention and Reciprocal Neighbor Softmax—to dramatically reduce the computational cost of multi‑image interaction. Landmark Agent Attention uses a set of sampled landmarks to perform low‑rank decomposition of the feature space, while Reciprocal Neighbor Softmax constructs a sparse affinity matrix that retains only highly related neighbor relationships.

Experimental results on several public person re-identification datasets demonstrate that NFormer achieves state‑of‑the‑art (SOTA) performance. The method can be easily combined with existing approaches to further boost accuracy.

The pipeline starts with a convolutional backbone that extracts deep features from each query image. Pairwise similarity between image features is computed to form a similarity matrix, which is then used for feature fusion, producing the final representation for retrieval.

Visualization (t‑SNE) shows that after NFormer processing, person embeddings are more tightly clustered with fewer outliers, facilitating more reliable re-identification.

The paper (arXiv:2204.09331) provides detailed analysis of the proposed modules and reports consistent improvements over baseline models across four benchmark datasets.

Computer Visiondeep learninglandmark agent attentionneighbor transformerPerson Re-identificationreciprocal neighbor softmax
Xiaohongshu Tech REDtech
Written by

Xiaohongshu Tech REDtech

Official account of the Xiaohongshu tech team, sharing tech innovations and problem insights, advancing together.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.