Co-Action Network: A Feature Interaction Model for Click‑Through Rate Prediction
The Co‑Action Network replaces costly Cartesian‑product feature crossing with lightweight micro‑net‑based interaction units that share parameters across feature pairs, delivering comparable CTR prediction accuracy while cutting parameters to one‑tenth and boosting online latency, as proven in large‑scale advertising deployments.
The ranking module is crucial in advertising, recommendation, and search systems because ranking results directly affect user experience. Click‑through rate (CTR) prediction is a core task in these systems, and feature crossing is an essential technique for improving CTR models.
Background : Traditional CTR models have evolved from linear models (LR, FM, MLR) to deep neural networks (DNN) that follow an Embedding&MLP paradigm. Two main families of feature‑crossing methods have emerged: (1) non‑parameterized approaches such as explicit feature‑cross IDs and Cartesian products, and (2) parameterized approaches that implicitly learn non‑linear feature interactions via model parameters (e.g., DeepFM, IPNN, ONN, DCN, xDeepFM, FiBiNET). While Cartesian‑product‑based methods achieve strong performance, they suffer from massive parameter explosion and high online latency.
Problem and Method : The authors seek a parameterized solution that retains the expressive power of Cartesian products without its drawbacks. They propose the Co‑Action Network (CAN), which introduces a micro‑net‑based feature‑interaction unit (Co‑Action Unit). Each feature pair is assigned a lightweight micro‑net that learns interaction weights from the IDs of the two features. The micro‑net parameters are shared across similar feature pairs, dramatically reducing the total parameter count while preserving the ability to model high‑order interactions.
Key Components of CAN :
Feature extraction, interest extraction, sequence modeling, and feature‑interaction modules.
The Co‑Action Unit receives two feature embeddings, passes them through a micro‑net (a small MLP), and outputs an interaction vector that is concatenated with the original embeddings before feeding into the final DNN.
Multi‑order polynomial inputs are added to the Co‑Action Unit to capture higher‑order interactions.
Three levels of independence are enforced: parameter independence, combination independence, and order independence, which together improve expressiveness and reduce interference between different feature interactions.
Production Deployment : Extensive engineering optimizations were applied to make CAN feasible in a large‑scale advertising system:
Sequence truncation reduces user‑behavior sequence length, improving QPS by ~20%.
Selective feature‑pair pruning cuts the number of combinations from 90 to 48, boosting QPS by ~30%.
Custom kernel implementations for the interaction unit’s matrix operations increase QPS by ~60% and further by ~47% after kernel fusion.
Empirical results on production traffic show that CAN achieves comparable performance to full Cartesian‑product models with only one‑tenth of the parameter size, delivering significant gains in GAUC, CTR, and RPM.
Summary : CAN demonstrates that micro‑net‑based feature interaction can effectively replace explicit Cartesian products, offering a scalable and high‑performance solution for CTR prediction. Future work includes exploring richer feature‑density handling, higher‑order interaction extensions, and further optimization of the Co‑Action Unit.
References (selected): Bian et al., 2022 (WSDM); Zhu et al., 2021; Qu et al., 2016; Yang et al., 2020; Huang et al., 2019; Lian et al., 2018.
Alimama Tech
Official Alimama tech channel, showcasing all of Alimama's technical innovations.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.