Big Data 19 min read

H5 Tracking Solution and Data Warehouse Design for User Behavior Analysis

The vivo Internet Big Data team presents a standardized, extensible H5 tracking solution that automates data collection via a JavaScript SDK for navigation, focus/blur, and visibility events, incorporates privacy safeguards, and feeds a multi‑layer data‑warehouse architecture with unified ID mapping and bitmap‑based retention modeling to support comprehensive user‑behavior dashboards and future advanced analyses.

vivo Internet Technology
vivo Internet Technology
vivo Internet Technology
H5 Tracking Solution and Data Warehouse Design for User Behavior Analysis

This article, authored by the vivo Internet Big Data team, presents a comprehensive H5 tracking (埋点) solution aimed at improving the efficiency of user behavior analysis. It outlines the planning of the H5 tracking scheme, the data collection process, and provides a reusable data acquisition strategy.

Background : H5 pages are widely used in web‑app development due to their flexibility and rich functionality. Current H5 tracking implementations are highly flexible but involve repetitive development and testing work, leading to wasted resources. A standardized, extensible tracking model is proposed to reduce labor and accelerate data analysis.

Analysis Model Overview : The model defines three analysis themes—basic analysis (pv, uv, session length, new/old users), page analysis (pv, uv, dwell time per page), and retention analysis (new‑user and active‑user retention, N‑day and day‑N retention). Corresponding metrics are described, and sample data (illustrative only) are shown.

Automatic Collection : To ease developers' workload, an automatic collection mechanism is introduced via a JavaScript SDK (h5sdk.js). Three rule‑based scenarios are covered:

Page navigation (URL change) triggers data collection.

Page focus/blur events capture when the page gains or loses focus.

Visibility change events capture browser tab switches.

For each scenario, the article provides detailed rule definitions and implementation notes, including handling of SPA vs. MPA routing, URL change detection, and event listener registration.

Code Example – Overriding History Methods :

/**
 * 拼接通用化上报参数
 * @param {string} 重写路由事件类型
 */
function resetHistoryFun(type){
    // 将原先的方法复制出来
    let originMethod = window.history[type]
    // 当window.history[type]函数被执行时,这个return出来的函数就会被执行
    return function(){
        // 执行原先的方法
        let rs = originMethod.apply(this, arguments)
        // 然后自定义事件
        let e = new Event(type.toLocaleLowerCase())
        // 将原先函数的参数绑定到自定义的事件上去,原先的是没有的
        e.arguments = arguments
        // 然后用window.dispatchEvent()主动触发
        window.dispatchEvent(e)
        return rs;
    }
}
window.history.pushState = resetHistoryFun('pushState') // 覆盖原来的pushState方法
window.history.replaceState = resetHistoryFun('replaceState') // 覆盖原来的replaceState方法
window.addEventListener('pushstate', reportBothEvent)
window.addEventListener('replacestate', reportBothEvent)

Code Example – Focus/Blur Listeners :

window.addEventListener('focus', () =>{
    console.log('页面得到焦点')
});
window.addEventListener('blur', () =>{
    console.log('页面失去焦点')
})

Code Example – Visibility Change Listener :

document.addEventListener('visibilitychange',  () => {
    if(document.hidden) {
        console.log('页面离开')
    } else {
        console.log('页面进入')
    }
})

The article also discusses data reporting methods (XMLHttpRequest, navigator.sendBeacon), compatibility constraints (non‑IE6/7/8), and fault‑tolerance via try‑catch blocks that report error events.

Privacy Compliance : Personal data protection measures are highlighted, including minimal identifier collection, user consent, and the ability to withdraw consent.

Data Warehouse Scheme : After establishing the tracking layer, the design proceeds to a data warehouse architecture that addresses three core problems: unified ID mapping, retention modeling using bitmap techniques, and multi‑layer model design (detail layer dw, light aggregation layer dma, thematic layer dmt, indicator layer da). The article provides SQL snippets for ID unification, retention bitmap handling, and retention metric calculation.

SQL Example – Unified ID Mapping:

SELECT xx, xx1,
       CASE WHEN appid IN(1) THEN 1
            WHEN appid IN(2) THEN 2
            WHEN appid IN(3) THEN 3
            WHEN appid IN(4,5,6,...) THEN 4
            ELSE 0 END AS id_flag,
       CASE WHEN appid IN(1) THEN id1
            WHEN appid IN(2) THEN id2
            WHEN appid IN(3) THEN id3
            WHEN appid IN(4,5,6,...) THEN IF(NVL(params['id1'],'')='',NVL(params['id2'],'NA'),params['id1'])
            ELSE 'NA' END AS unique_id,
       appid
FROM ods_table_name_XXX a
WHERE day='${today}'
  AND hour='${etl_hour}'
  AND appid IN (1,2,3,...)
  AND event_id IN (XXX|167,XXX|168,...);

SQL Example – Retention Bitmap Logic and Metric Calculation:

## 利用bitmap思想,留存标签满8位转化为16进制组合到retain_tag之前
SELECT user_unique_id,
       IF(LENGTH(tmp_retain_tag)=8, is_active, CONCAT(is_active, tmp_retain_tag)) AS tmp_retain_tag,
       IF(LENGTH(tmp_retain_tag)=8, CONCAT(con_tmp_retain_tag, retain_tag), retain_tag) AS retain_tag,
       is_active
FROM (
    SELECT unique_id,
           tmp_retain_tag,
           CASE WHEN LENGTH(CONV(tmp_retain_tag,2,16))=2 THEN CONV(tmp_retain_tag,2,16)
                ELSE CONCAT('0', CONV(tmp_retain_tag,2,16)) END AS con_tmp_retain_tag,
           retain_tag,
           FIRST_VALUE(is_active) OVER(PARTITION BY unique_id,appid,topic_id ORDER BY first_active_day DESC) AS is_active
    FROM (
        SELECT unique_id, topic_id, appid, first_active_day, last_active_day,
               '0' AS is_active, tmp_retain_tag, retain_tag
        FROM table_active_XX_df
        WHERE day='${last_etl_date}'
        UNION ALL
        SELECT unique_id, topic_id, appid, day AS first_active_day, day AS last_active_day,
               '1' AS is_active, '' AS tmp_retain_tag, '' AS retain_tag
        FROM table_active_XX_hi
        WHERE day='${etl_date}'
    ) a
) b
WHERE rn=1;

The data flow diagram shows the end‑to‑end pipeline from data ingestion, through the DW layers, to MySQL‑based reporting tables used by front‑end dashboards.

Data Presentation : Reports are rendered in a user behavior analysis platform, with MySQL storing the final metrics. Sample dashboards include application overview, user retention, and page analysis visualizations.

Future Outlook : The solution is positioned as a foundation for more advanced analyses such as path analysis, funnel analysis, attribution modeling, and conversion tracking as business needs evolve.

big datadata warehouseuser behavior analysisautomatic collectionH5 trackingJavaScript SDK
vivo Internet Technology
Written by

vivo Internet Technology

Sharing practical vivo Internet technology insights and salon events, plus the latest industry news and hot conferences.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.