Artificial Intelligence 15 min read

Meituan Technical Team's 8 CVPR 2023 Papers: Overview and Insights

This article reviews eight CVPR 2023 papers selected by Meituan’s technology team, covering self‑supervised learning, domain adaptation, federated learning, object detection, 3D reconstruction, GAN‑based pre‑training, RGB‑T tracking, vision‑language navigation, and visual‑textual layout generation, highlighting each work’s methodology, experiments, and reported performance gains.

Meituan Technology Team

Jun 15, 2023

Meituan Technical Team's 8 CVPR 2023 Papers: Overview and Insights

CVPR 2023 Overview

CVPR (IEEE Conference on Computer Vision and Pattern Recognition) is a top‑tier vision conference founded in 1983. According to Google Scholar’s 2022 ranking, CVPR ranks fourth among all scholarly venues, after Nature, NEJM and Science. In 2023 the conference received 9,155 submissions and accepted 2,360 papers, yielding an acceptance rate of 25.78%.

01 Divide and Adapt: Active Domain Adaptation via Customized Learning

Authors: Huang Duojun, Li Jichang, Chen Weikai, Huang Junshi, Chai Zhenhua, Li Guanbin

Key contribution: A sampling policy that selects informative unlabeled target samples by jointly considering domain discrepancy and model uncertainty, and a customized loss that constrains subsets of samples with different transferability. This improves robustness across unsupervised (UDA), semi‑supervised (SSDA) and source‑free (SFDA) domain adaptation scenarios.

Evidence: Experiments on multiple domain‑adaptation benchmarks show the method achieves the best performance in UDA, SSDA and SFDA settings.

Paper PDF: https://openaccess.thecvf.com/content/CVPR2023/papers/Huang_Divide_and_Adapt_Active_Domain_Adaptation_via_Customized_Learning_CVPR_2023_paper.pdf

02 Efficient Second‑Order Plane Adjustment

Author: Zhou Lipu

Key contribution: Derivation of a closed‑form solution for the Hessian of the plane‑adjustment (PA) least‑squares problem and a Newton‑based second‑order optimizer that analytically eliminates plane parameters. By reducing the variable count and guaranteeing the optimal plane at each iteration, the method converges faster than Levenberg‑Marquardt approximations.

Evidence: Empirical results on depth‑sensor 3D reconstruction tasks demonstrate superior speed and accuracy compared with state‑of‑the‑art (SOTA) methods.

Paper PDF: https://openaccess.thecvf.com/content/CVPR2023/papers/Zhou_Efficient_Second-Order_Plane_Adjustment_CVPR_2023_paper.pdf

03 AeDet: Azimuth‑Invariant Multi‑view 3D Object Detection

Authors: Feng Chengjian, Xie Zequn, Zhong Yujie, Chu Xiangxiang, Ma Lin

Key contribution: Introduces azimuth‑equivariant convolution (AeConv) and azimuth‑equivariant anchors that preserve the radial symmetry of bird‑eye‑view (BEV) features. A camera‑decoupled virtual depth module unifies predictions across cameras.

Result: On the nuScenes benchmark the AeDet detector achieves 62.0 % NDS, surpassing existing multi‑view 3D detectors.

Paper PDF: https://arxiv.org/pdf/2211.12501.pdf

04 Masked Auto‑Encoders Meet Generative Adversarial Networks

Authors: Fei Zhengcong, Fan Mingyuan, Zhu Li, Huang Junshi, Wei Xiaoming, Wei Xiaolin

Key contribution: Combines MAE pre‑training with a GAN framework. The generator predicts masked patches; the discriminator judges each patch (real vs. generated) after concatenation with visible patches.

Evidence: The MAE‑GAN framework improves reconstruction quality and learns stronger visual representations. On ImageNet‑1k, a ViT‑B model pretrained for 200 epochs with MAE‑GAN outperforms a vanilla MAE‑B model trained for 1,600 epochs in downstream classification.

Paper PDF: https://feizc.github.io/resume/ganmae.pdf

05 Elastic Aggregation for Federated Optimization

Authors: Chen Dengsheng, Hu Jie, Vince Junkai Tan, Wei Xiaoming, Wu Enhua

Problem: Client drift in federated learning caused by heterogeneous data distributions leads to slow convergence and sub‑optimal final models.

Method: Elastic Aggregation computes parameter sensitivity on each client using unlabeled data, then weights the global model update by these sensitivities. This is the first FL method that fully exploits unlabeled data.

Result: Experiments show significant performance gains on both visual and textual tasks in federated settings.

Paper PDF: https://openreview.net/pdf?id=EWjYk3R2jhr

06 Bridging Search Region Interaction with Template for RGB‑T Tracking

Authors: Hui Tianrui, Xun Zizheng, Peng Fengguang, Huang Junshi, Wei Xiaoming, Wei Xiaolin, Dai Jiao, Han Jizhong, Liu Si

Key contribution: Template‑Bridged Search‑Region Interaction (TBSI) module uses a template as a mediator to exchange contextual information between RGB and thermal (TIR) search regions. The template is updated with multimodal context and the module is inserted into a ViT backbone.

Result: Integrated TBSI achieves state‑of‑the‑art performance on three RGB‑T tracking datasets.

Paper PDF: https://openaccess.thecvf.com/content/CVPR2023/papers/Hui_Bridging_Search_Region_Interaction_With_Template_for_RGB-T_Tracking_CVPR_2023_paper.pdf

07 Adaptive Zone‑Aware Hierarchical Planner for Vision‑Language Navigation

Authors: Gao Chen, Peng Xingyu, Yan Mi, Wang He, Yang Lirong, Ren Haibing, Li Hongsheng, Liu Si

Problem: Existing VLN methods use a single‑step planner, limiting the ability to set and achieve sub‑goals.

Method: Adaptive Zone‑Aware Hierarchical Planner (AZHP) splits navigation into high‑level zone partition (Scene‑Aware Adaptive Zone Partition, SZP) and low‑level action execution. A State Switching Module (SSM) asynchronously triggers the two levels. Goal Zone Selection (GZS) chooses appropriate zones for the current sub‑goal. Hierarchical reinforcement learning (HRL) and auxiliary supervision train the framework.

Result: AZHP attains top performance on REVERIE, SOON and R2R datasets.

Paper PDF: https://openaccess.thecvf.com/content/CVPR2023/papers/Gao_Adaptive_Zone-Aware_Hierarchical_Planner_for_Vision-Language_Navigation_CVPR_2023_paper.pdf

08 PosterLayout: A New Benchmark and Approach for Content‑Aware Visual‑Textual Presentation Layout

Authors: Xu Xiaoyuan, He Xiangteng, Peng Yuxin, Kong Hao, Zhang Qing

Problem: Existing layout‑generation methods ignore the interaction between the image canvas and textual elements, leading to incompatibility.

Dataset: PosterLayout, a benchmark covering source‑domain diversity, theme diversity and layout complexity.

Method: Design Sequence GAN models human‑like design sequences. It introduces Design Sequence Formation to convert layout generation into a temporally ordered sequence, using canvas visual features as the initial state.

Result: Experiments show the method outperforms prior approaches on the PosterLayout benchmark.

Paper PDF: https://arxiv.org/pdf/2303.15937.pdf

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

computer vision GaN layout generation self-supervised learning domain adaptation federated learning vision-language navigation CVPR 2023 3D Object Detection RGB‑T tracking

Written by

Meituan Technology Team

Over 10,000 engineers powering China’s leading lifestyle services e‑commerce platform. Supporting hundreds of millions of consumers, millions of merchants across 2,000+ industries. This is the public channel for the tech teams behind Meituan, Dianping, Meituan Waimai, Meituan Select, and related services.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.