Artificial Intelligence 8 min read

Neighbor Transformer (NFormer): Robust Person Re-identification via Interactive Multi‑image Modeling

Neighbor Transformer (NFormer) introduces interactive multi‑image modeling for person re‑identification, using Landmark Agent Attention and Reciprocal Neighbor Softmax to efficiently fuse features across images, achieving state‑of‑the‑art accuracy and tighter embedding clusters on multiple benchmark datasets.

Xiaohongshu Tech REDtech

Jun 13, 2022

Neighbor Transformer (NFormer): Robust Person Re-identification via Interactive Multi‑image Modeling

At CVPR 2022, the Xiaohongshu multimodal algorithm team introduced Neighbor Transformer (NFormer), a novel network for person re-identification that models multiple input images jointly with a transformer, rather than processing a single image independently. This interactive modeling yields more robust feature representations.

NFormer incorporates two key modules—Landmark Agent Attention and Reciprocal Neighbor Softmax—to dramatically reduce the computational cost of multi‑image interaction. Landmark Agent Attention uses a set of sampled landmarks to perform low‑rank decomposition of the feature space, while Reciprocal Neighbor Softmax constructs a sparse affinity matrix that retains only highly related neighbor relationships.

Experimental results on several public person re-identification datasets demonstrate that NFormer achieves state‑of‑the‑art (SOTA) performance. The method can be easily combined with existing approaches to further boost accuracy.

The pipeline starts with a convolutional backbone that extracts deep features from each query image. Pairwise similarity between image features is computed to form a similarity matrix, which is then used for feature fusion, producing the final representation for retrieval.

Visualization (t‑SNE) shows that after NFormer processing, person embeddings are more tightly clustered with fewer outliers, facilitating more reliable re-identification.

The paper (arXiv:2204.09331) provides detailed analysis of the proposed modules and reports consistent improvements over baseline models across four benchmark datasets.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

computer vision deep learning landmark agent attention neighbor transformer person re-identification reciprocal neighbor softmax

Written by

Xiaohongshu Tech REDtech

Official account of the Xiaohongshu tech team, sharing tech innovations and problem insights, advancing together.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.