Information Security 16 min read

Design and Architecture of a Full‑Chain Data Warehouse for Information Security

The article presents a comprehensive design of an end‑to‑end data warehouse for information‑security governance, detailing background motivations, multi‑layer data architecture, dimension modeling, bus‑matrix mapping, real‑time (lambda/kappa) processing, data‑dictionary integration, and future directions toward unified streaming‑batch solutions.

58 Tech

Apr 26, 2022

Design and Architecture of a Full‑Chain Data Warehouse for Information Security

Background – In information‑security business, massive heterogeneous data (features, policies, user behavior) must be analyzed and validated, requiring a "full‑link" data warehouse that integrates all business‑line data into a dense, highly‑integrated data mesh, turning data into proactive security production capacity.

Data Layering – The warehouse is divided into six layers:

Seq

Data Layer

Abbreviation

Purpose

Raw Data Layer

RAW

Snapshot of source‑system data, stored daily with full detail.

Basic Data Layer

ODS

Business‑concept organized data with standardized names and codes.

General Data Layer

DWD

Fine‑grained aggregated layer built on star or snowflake models; metrics and dimensions are standardized.

Aggregated Data Layer

DWS

Data marts for specific business needs, designed with star or snowflake schemas.

Dimension Layer

DIM

Dimension tables providing rich attributes, historical traceability, and consistency across common dimensions.

Temporary Layer

TMP

Transient tables to reduce computation difficulty and improve runtime efficiency.

Dimension Modeling – Two mainstream approaches (normalized vs. dimensional) are compared. Normalized warehouses require heavy upfront work but yield stable long‑term maintenance; dimensional modeling is more agile, suits frequently changing business, and demands less expertise. Four key steps are outlined: selecting business processes, declaring grain, identifying dimensions, and confirming facts.

Bus Matrix – The bus matrix acts as a map of the warehouse, linking each business process (rows) with common dimensions (columns). It provides a macro view of which processes share which dimensions, enabling quick alignment of data requirements with warehouse structures.

Overall Architecture – The warehouse is split into three logical parts:

General warehouse: stores cross‑business capability data (e.g., hunter‑risk system, cloud authentication).

Business warehouse: built for specific industry‑level analyses.

Subject warehouse: unified, cross‑business subject areas (traffic, content, user, etc.) based on consistent dimensions.

This three‑tier design mirrors the IKEA analogy: a public floor (general warehouse) for developers and a dedicated floor (business warehouse) for analysts.

Real‑Time Evolution – Discusses Lambda (batch + stream) and Kappa (stream‑only) architectures. Lambda offers flexibility but incurs double‑engine maintenance and data inconsistency; Kappa simplifies the stack by using a message queue (e.g., Kafka) and Flink, enabling stream‑to‑Hive writes and automatic small‑file compaction.

Data Dictionary – Serves as the core metadata service (Hive Metastore) that supplies schema information to streaming platforms, enabling zero‑code configuration for feature extraction, model training, and online inference.

Future Outlook – The team is exploring data‑lake‑based stream‑batch integration to replace the current Hive + Kafka pattern, and addressing emerging security challenges such as unstructured image/text attacks, requiring new data‑structuring and linkage solutions.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Real-time Processing Data Warehouse Information Security dimension modeling

Written by

58 Tech

Official tech channel of 58, a platform for tech innovation, sharing, and communication.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.