Automated Feature Data Push for CRM Intelligent Recommendation: Architecture and Implementation
This article details the end‑to‑end automated feature data pipeline for 58 CRM's intelligent recommendation system, covering background, feature store design on HBase, in‑memory feature calculation using dictionaries and matrices, aggregation storage, and future planning for AI‑driven sales optimization.
Background: The CRM system serves 58's sales team, and the AI Lab launched an intelligent opportunity allocation project that applies machine learning and recommendation techniques to assign suitable leads to salespeople, improving conversion rates and revenue.
Data Flow Overview: Features are extracted from various opportunity pools, processed, and pushed to dozens of models for training and online prediction, forming the core of the feature push automation.
Feature Store: Implemented on HBase, the store retains at least 1.5 years of feature data with daily updates, uses a zipper‑table approach for storage efficiency, supports version management, flexible expansion, and both offline and real‑time data fusion.
Feature Calculation: Performed in memory using a feature dictionary and feature matrix. Basic features are combined with product‑line, calculation functions (COUNT, SUM, MAX, etc.), and time windows to generate derived features. The matrix stores only non‑zero entries, enabling sparse‑matrix computation.
Feature Dictionary: Stores metadata for each base feature (dimension, ID, name, explanation, calculation type, time window, source table) to facilitate management and selection across models.
Feature Matrix: Each ID is transformed into a sparse matrix where rows represent feature IDs and columns represent time windows; calculations are executed concurrently on this structure, dramatically reducing storage and improving efficiency.
Aggregation Storage: After computation, features are aggregated per business line, calculation function, and time window, then stored in a unified format (LibSVM) to support both machine‑learning and deep‑learning models, reducing storage by ~70% compared to JSON.
Summary & Planning: The automated feature push system now satisfies the basic feature needs of downstream AI models, driving personalized search/recommendation in CRM and contributing to revenue growth. Future work includes adding NLP‑generated text features and real‑time feature streams.
Department Introduction: 58 Tongcheng AI Lab, part of the TEG technology platform, focuses on applying AI across products such as intelligent客服, speech analysis, and the CRM opportunity allocation system.
58 Tech
Official tech channel of 58, a platform for tech innovation, sharing, and communication.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.