Big Data 17 min read

How to Build a Unified Big Data Security Platform with Ranger and Custom Authorization

This article explains the design and implementation of a unified data security control platform that protects user privacy and corporate data across multiple big‑data components (Hive, Hetu, GaussDB) by integrating Apache Ranger, custom authorization APIs, asynchronous processing, distributed locking, and SDK‑based authentication to achieve fine‑grained, one‑stop permission management.

Xingsheng Youxuan Technology Community

Aug 30, 2022

How to Build a Unified Big Data Security Platform with Ranger and Custom Authorization

1. Overview

With the maturity of mobile internet, e‑commerce has become a major consumption mode, generating massive data volumes. Personal privacy and corporate business data must be protected according to privacy laws and confidentiality requirements. Core tables must be accessed only after authorization, and sensitive fields must be masked.

2. Industry Analysis

Popular big‑data permission‑control products include Apache Ranger and Apache Sentry. Sentry, developed by Cloudera, provides fine‑grained control for HDFS, Hive and Impala. Ranger, originated by Hortonworks, supports a broader set of components such as HDFS, Hive, HBase, YARN, Storm, Knox, Kafka, Solr and NiFi.

3. Business Requirements and System Constraints

Although Ranger offers fine‑grained control, a custom platform is needed because:

Ranger only manages Hadoop ecosystem data and cannot control MPP‑style OLAP databases like GaussDB or StarRocks.

Ranger Admin’s UI does not provide rapid grant/revoke workflows, making it hard to locate a user’s database and table permissions.

Different MPP databases require separate authorization commands, increasing management cost.

The platform must be easily integrable with other systems.

The design goals are a one‑stop solution, self‑secured system, and service‑oriented architecture.

4. Architecture and Implementation

4.1 Overall Technical Architecture

The platform consists of four layers: Rest API layer (exposes user, group, DB cluster, audit, approval and permission APIs), DB permission service layer (implements business logic for different databases), asynchronous authorization processing layer (queues concurrent requests to avoid permission conflicts), and user‑access SDK layer (provides a data‑source plugin so developers only need a Kylin user ID to obtain a JDBC connection pool).

4.2 User Relationship

Users belong to a single virtual group; each group maps to one DB user per cluster; a DB user can belong to only one group. This structure enables the platform to locate the correct DB user for a Kylin login and enforce permissions at the database level.

4.3 Authorization Management

Permission granularity differs among components (see Figure 5). The platform defines a naming convention for Ranger policies (e.g., ALL#dim#all#all for database‑level, ALL#dim#table1#all for table‑level) and stores users in the users array of each policy. For Hive/Hetu, Ranger policies are created/updated asynchronously (Figure 7) and synchronized to the Agent Plugin every 30 seconds. For GaussDB, Ranger cannot manage permissions, so the platform invokes GaussDB’s native grant/revoke APIs; to achieve library‑table separation, each table grant also updates the corresponding database grant and vice‑versa, processed through an asynchronous queue with a distributed lock to avoid race conditions (Figures 8 and 9).

Open APIs are provided for external systems: a REST token‑based authentication API, a virtual‑user API that wraps MRS security calls, a Ranger authorization API for Hive/Presto, and a GaussDB authorization API. The SDK supplies a data‑source plugin that maps a Kylin user ID to a pre‑allocated connection pool; actual permission checks are delegated to the underlying database’s own authentication module.

4.4 Authentication

The authentication SDK integrates with Kylin, token‑based REST, and other identity providers. Applications catch database exceptions to determine whether a user has the required permission (Figure 9).

5. Summary

The security control platform delivers unified, fine‑grained authorization for big‑data resources, reduces operational overhead, prevents data leakage, and enables “one‑person‑one‑view” access control across heterogeneous data stores.

Data security control platform architecture diagram

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Distributed Systems Big Data Access Control Data Security Authorization ranger GaussDB

Written by

Xingsheng Youxuan Technology Community

Xingsheng Youxuan Technology Official Account

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.