Big Data 12 min read

Real‑time OLAP with Flink and Hologres: Replacing Lambda/Kappa Architectures

This article analyzes the limitations of traditional Lambda and Kappa big‑data architectures for online‑school behavior‑feature pipelines and presents a Flink + Hologres solution that provides unified real‑time OLAP and high‑concurrency point‑query services, including design choices, implementation details, and performance results.

Xueersi Online School Tech Team
Xueersi Online School Tech Team
Xueersi Online School Tech Team
Real‑time OLAP with Flink and Hologres: Replacing Lambda/Kappa Architectures

The online‑school service team needs to generate real‑time behavior features for millions of daily user events, ingesting logs and binlog data into Kafka, processing them with Flink, and delivering both aggregated OLAP results and low‑latency point queries to downstream algorithms.

Traditional big‑data designs such as Lambda (batch + speed layers) and Kappa (stream‑only with replay) are examined; both rely on separate storage systems (HBase, ClickHouse/Druid, Hive) which cause data duplication, format conversion, and high maintenance costs.

To simplify the stack, the team selected a Flink + Hologres architecture: Flink serves as a unified stream‑batch engine, while Hologres acts as a real‑time OLAP store that also provides high‑concurrency point‑lookup, eliminating the need for multiple downstream systems.

Hologres is PostgreSQL‑compatible, supports petabyte‑scale, low‑latency analytics, and offers both OLAP multidimensional queries and fast key‑value style lookups, making it ideal for the described scenario.

Development workflow: historical data for the past 29 days is pre‑aggregated offline; the current day’s data is processed in real time; the combined 30‑day feature set is written to Hologres. Airflow schedules periodic Hive syncs for offline access.

Data ingestion handles ~100 million daily logs; to avoid back‑pressure, a tumbling window of 10 ms is used for batch processing:

//窗口计算部分代码
SingleOutputStreamOperator
> apply = flatMap
    .timeWindowAll(Time.milliseconds(10)) //10ms窗口
    .apply(new AllWindowFunction
, TimeWindow>() {
        @Override
        public void apply(TimeWindow timeWindow, Iterable
iterable, Collector
> collector) throws Exception {
            ArrayList
clickFeatures = Lists.newArrayList(iterable);
            if (clickFeatures.size() > 0) {
                collector.collect(clickFeatures);
            }
        }
    });

Data processing assigns default values to missing event types and categorises logs into three groups (learning‑center, personal‑page, others) to reduce unnecessary aggregation.

Batch writes to Hologres use the Holohub sink. Maven dependency:

<dependency>
  <groupId>com.aliyun.datahub</groupId>
  <artifactId>aliyun-sdk-datahub-holo</artifactId>
  <version>2.15.0‑SNAPSHOT</version>
</dependency>

Custom Flink sink implementation (RichSinkFunction) overrides open , invoke , and a retry helper:

public void open(Configuration parameters) throws Exception {
    //建立hologres客户端
    datahubClient = buildClient(endpoint, accessId, accessKey);
    //获取结果表schema信息
    schema = datahubClient.getTopic(database, tableName).getRecordSchema();
    getRuntimeContext().addAccumulator("outAccumulateInsert", this.outAccumulateInsert);
}

public void invoke(List
value, Context context) throws Exception {
    List
records = new ArrayList<>();
    for (ClickFeature v : value) {
        TupleRecordData recordData = new TupleRecordData(schema);
        RecordEntry dhRecord = new RecordEntry();
        TupleRecordData resultTupleRecordData = setValuesForTableField(recordData, v);
        dhRecord.setRecordData(resultTupleRecordData);
        records.add(dhRecord);
    }
    try {
        PutRecordsResult result = datahubClient.putRecords(database, tableName, records);
        int failed = result.getFailedRecordCount();
        if (failed > 0) {
            retry(datahubClient, result.getFailedRecords(), retryTimes, database, tableName);
        }
    } catch (DatahubClientException e) {
        System.out.println("requestId:" + e.getRequestId() + "\tmessage:" + e.getErrorMessage());
    }
}

public static void retry(DatahubClient client, List
records, int retryTimes, String database, String tableName) {
    boolean suc = false;
    while (retryTimes != 0) {
        retryTimes--;
        PutRecordsResult recordsResult = client.putRecords(database, tableName, records);
        if (recordsResult.getFailedRecordCount() > 0) {
            retry(client, recordsResult.getFailedRecords(), retryTimes, database, tableName);
        }
        suc = true;
        break;
    }
    if (!suc) {
        System.out.println("retryFailure");
    }
}

Operational notes: each batch write must stay below 4 MB; aggregation is split into dimension joins, field‑type partitions, and a two‑step 30‑day computation (29 days offline + today real‑time).

Generated features include basic user attributes, subject‑related fields, and interest signals (e.g., activity participation, comments, shares).

Deployment: online feature service runs via Airflow PostgresOperator with minute‑level updates served by T‑service; offline feature tables are synced to Hive using Airflow DataXOperator. Data older than three months is archived to HDFS.

Conclusion: Using Hologres for both OLAP analysis and point‑lookup removes the need for separate batch systems and KV stores, cuts development and maintenance effort, and achieves stable operation with average query latency around 20 ms, with future goals of sub‑10 ms responses.

FlinkstreamingHologresReal-time OLAPlambda architectureKappa architecture
Xueersi Online School Tech Team
Written by

Xueersi Online School Tech Team

The Xueersi Online School Tech Team, dedicated to innovating and promoting internet education technology.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.