Xianyu SPU System Architecture and Data Pipeline Overview
Xianyu built a custom SPU system and data pipeline that cleans Alibaba’s raw SPU data, defines key, binding, sales and product attributes, stores enriched records in MySQL, syncs to OpenSearch, and supports diverse business scenarios such as inspection, search publishing, and worry‑free purchase.
In e‑commerce, the concepts of SPU (Standard Product Unit) and SKU are fundamental for product data storage. An SPU represents a generic product using the shortest, most standard description, acting as a bridge across domains and channels.
The SPU model aggregates the smallest unit of product information and is built upon four attribute types: key attributes (defining a unique product, e.g., brand and model), sales attributes (options that affect purchase, e.g., color and memory), product attributes (additional details such as warranty), and binding attributes (refinements of key attributes). In Alibaba’s ecosystem, an SPU = key attributes + binding attributes + ordinary attributes.
Why Xianyu builds its own SPU instead of reusing Alibaba’s mature system? Issues include >90% unusable data after cleaning, the need to accommodate unique business lines (e.g., coupons, rentals), the requirement for multi‑source data mounting, and the desire for a standardized, business‑driven SPU definition that allows operational intervention.
The Xianyu SPU data pipeline addresses several goals: compatibility with Alibaba’s SPU, customizable attributes for special business lines, globally unique key attributes with international naming and aliases, horizontal extensions (inspection tags, search flags, etc.), and a visual platform for operations with approval workflow.
Data flow: raw Alibaba SPU data → ODPS cleaning (removing dirty characters, incomplete records) → reduction to ~30% of original volume → manual operation selection → enrichment with business flags (biz, bizProperty) → storage in MySQL (auto‑incremented spu_id) → synchronization to OpenSearch for real‑time query. This pipeline solves data completeness, ID management, real‑time updates without API pushes, and flexible indexing with fuzzy search and relevance ranking.
SPU data includes attributes, bindings, sales attributes, images, titles, plus extended fields such as Taobao category, Xianyu channel category, business identifiers, and platform publishing metrics, providing a scalable foundation for multiple business scenarios.
Business scenarios leveraging Xianyu SPU: Inspection Service (验货宝) – validates product eligibility for inspection and ensures complete inspection items. SPU Search Publisher – matches existing SPU to reduce user publishing cost and improve experience. Worry‑Free Purchase – uses SPU as the product entity for external services.
Xianyu Technology
Official account of the Xianyu technology team
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.