Real-Time UV and PV Analytics with Flink SQL on Tencent Cloud Oceanus
This guide shows how to build a real‑time UV and PV analytics pipeline on Tencent Cloud Oceanus by connecting a self‑hosted Kafka cluster to Flink SQL, using Redis for deduplicated visitor counts, page view logs, and conversion‑rate calculations via hop windows.
This article explains how to implement real-time UV (Unique Visitor) and PV (Page View) metrics statistics using Apache Flink on Tencent Cloud Oceanus platform, combined with self-built Kafka cluster and Redis database.
Solution Overview:
The solution combines local self-built Kafka cluster, Tencent Cloud Oceanus (Flink), and Cloud Redis to perform real-time visual analysis of UV, PV, and conversion rate metrics for blogs and e-commerce websites.
Key Concepts:
UV (Unique Visitor): Number of unique visitors. If a user visits the same page 5 times, UV only increases by 1 as it counts deduplicated users.
PV (Page View): Number of page views. If a user visits the same page 5 times, PV increases by 5.
Conversion Rate: Transactions / Page Views
Architecture Components:
Self-built Kafka cluster in local IDC
Private Network (VPC)
Direct Connect/Cloud Connect/VPN/Peer Connection
Oceanus (Flink)
Cloud Redis
Implementation Steps:
1. Create VPC network
2. Create Oceanus cluster
3. Create Redis cluster
4. Configure self-built Kafka cluster - modify advertised.listeners to use IP instead of hostname
5. Establish network connectivity between IDC and Tencent Cloud via VPN
Data Format:
Kafka topic stores data in JSON format:
{"record_type":0, "user_id": 6, "client_ip": "100.0.0.6", "product_id": 101, "create_time": "2021-09-06 16:00:00"}Where record_type 0 = browse record, record_type 1 = purchase record.
Flink SQL Implementation:
Source table definition:
CREATE TABLE `input_web_record` (
`record_type` INT,
`user_id` INT,
`client_ip` VARCHAR,
`product_id` INT,
`create_time` TIMESTAMP,
`times` AS create_time,
WATERMARK FOR times AS times - INTERVAL '10' MINUTE
) WITH (
'connector' = 'kafka',
'topic' = 'uvpv-demo',
'scan.startup.mode' = 'earliest-offset',
'properties.bootstrap.servers' = '10.1.0.10:9092',
'properties.group.id' = 'WebRecordGroup',
'format' = 'json',
'json.ignore-parse-errors' = 'true',
'json.fail-on-missing-field' = 'false'
);Sink tables for UV (using Redis SET), PV (using Redis LIST), and conversion rate (using Redis STRING).
Business logic uses HOP window for 10-minute aggregation intervals.
Result Storage:
userids: Stores UV using Redis SET type for deduplication
pagevisits: Stores PV using Redis LIST type
conversion_rate: Stores conversion rate (purchases/page views)
The article notes that for large-scale UV deduplication, Redis HyperLogLog can be used for minimal memory footprint.
Tencent Cloud Developer
Official Tencent Cloud community account that brings together developers, shares practical tech insights, and fosters an influential tech exchange community.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.