User Portrait Scenarios and Technical Implementation Solutions
This article presents a comprehensive overview of user portrait applications across various industries, detailing common scenarios, product functionalities, and a step‑by‑step technical solution that includes data collection, tag management, ETL pipelines, and service architecture for real‑time and offline processing.
Guest Speaker: Zhao Hongtian, Big Data Architect
Editor: Su Liping, CaiXun Co., Ltd.
Platform: DataFunTalk
01 Common Application Scenarios
Different industries have distinct data sources and portrait needs; typical TOC internet, e‑commerce, and security scenarios are outlined to illustrate how 360‑degree user profiles support personalized content, marketing, and risk monitoring.
Internet TOC can collect registration info, strong authentication data, in‑app behavior, and form dimensions, enabling content recommendation, personalized services, and SOP‑based marketing.
E‑commerce leverages registration, behavior, and attribute data for SMS, email, and app push notifications, as well as VIP services.
Security applications gather identity, travel, and facial recognition data to build comprehensive safety profiles, enabling real‑time alerts and risk mitigation.
Internet finance aggregates registration, behavior, and device data to create 360° profiles for risk assessment, loan eligibility, and tailored operational strategies.
02 Portrait Product Functions
Key functions include tag metadata management, single‑user portraits, crowd selection, crowd analysis, behavior analysis, and tag‑based SOP automation.
1. Tag Metadata Management
Tags are organized in hierarchical categories based on user attributes and behaviors, with metadata showing source data, daily production, and coverage.
2. Single‑User Portrait
After tagging, a user’s full label set can be retrieved via UI or API; two use cases are internal analysis and high‑concurrency service calls.
3. Crowd Selection
Combines tag rules to filter users into target groups for subsequent operations.
4. Crowd Analysis
Multi‑dimensional analysis of selected groups produces reports and detailed user/tag lists.
5. Behavior Analysis
Implements common analytical models (retention, event, funnel, distribution) by converting analyst SQL into productized data models.
6. Tag‑Based SOP
Standardized operating procedures trigger automated messages based on tags, enabling personalized outreach.
03 Technical Implementation Plan
1. Overall Data Flow
Data is collected from logs and business databases, stored in an ODS layer, processed into tags and wide tables, then moved to DWS and service layers for OLAP and API consumption.
2. System Blueprint
Includes tag planning, data development, and application deployment; new tags or scenarios can be added within the existing framework.
3. ETL Scheduling Model
Handles tag computation, validation, crowd calculation, and wide‑table generation, feeding results to Redis, ClickHouse, or Elasticsearch for service calls.
4. Technology Stack
Combines big‑data components (Hive, Spark, HBase, ClickHouse, ES) with application services (Java/Scala, micro‑services frameworks).
5. Stack Challenges
Big‑data: Selecting appropriate storage for different query patterns (e.g., Redis/HBase for single‑user, ClickHouse for OLAP, ES for keyword search) and optimizing daily tag‑generation jobs.
Application Services: Implementing CRUD for tag management and handling high‑concurrency API requests for portrait queries.
04 Q&A
Q: What data‑security measures are taken for user portrait data?
A: Sensitive data is excluded at collection; access control is enforced via API permissions, allowing only authorized groups to view specific tags.
Q: How should a company integrate an existing recommendation system with a new tag‑portrait platform?
A: Align business‑driven tag definitions across departments to avoid duplication and ensure consistent usage.
Q: What methodologies exist for designing a second‑level tag hierarchy?
A: Base the hierarchy on business scenarios, involve product managers and domain experts, and iterate based on usage; reference industry literature on tag taxonomy.
Q: How to handle offline vs. real‑time tags?
A: Offline tags cover the majority of use cases; real‑time tags are used for immediate push scenarios and are often derived from streaming jobs rather than traditional tag tables.
Thank you for reading.
DataFunSummit
Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.