Big Data 15 min read

Using Apache Phoenix on CDH HBase: Installation, Configuration, and Secondary Index Creation

This article explains how to integrate Apache Phoenix with CDH‑based HBase, covering Phoenix overview, version selection, parcel installation, HBase configuration, command‑line usage, mapping existing tables, creating schemas and views, building secondary indexes, and comparing different index types for performance optimization.

Big Data Technology Architecture
Big Data Technology Architecture
Big Data Technology Architecture
Using Apache Phoenix on CDH HBase: Installation, Configuration, and Secondary Index Creation

Apache Phoenix is a high‑performance SQL engine built on top of HBase, providing both OLTP and OLAP capabilities and offering secondary‑index support for HBase tables.

Phoenix introduction

CDH HBase integration with Phoenix

Creating secondary indexes on HBase via Phoenix

Overview of Phoenix index types

Phoenix Introduction

Phoenix stores data in HBase and exposes a standard JDBC API. It supports standard SQL, full ACID transactions, and provides a secondary‑index solution for HBase. Phoenix also integrates with components such as Spark, Hive, and Flume.

CDH HBase Integration – Version Information

For CDH platforms, Phoenix‑4.14.0‑cdh5.{11,12,13,14}.2 is available as source, binary tar, and parcel packages. The parcel format is recommended; download links are provided.

Higher CDH versions can follow the official Cloudera announcement; lower versions can use CLABS_Phoenix (max Phoenix 4.7.0) from the Cloudera Labs parcel repository.

Installation

Download the appropriate parcel package and place it on an HTTP server.

[root@hadoop-01 /var/www/html/phoenix/4.14.0]$ ll
total 300524
-rw-r--r-- 1 root root 307722240 Feb  3 19:30 APACHE_PHOENIX-4.14.0-cdh5.11.2.p0.3-el7.parcel
-rw-r--r-- 1 root root       178 Feb  3 19:28 APACHE_PHOENIX-4.14.0-cdh5.11.2.p0.3-el7.parcel.sha512
-rw-r--r-- 1 root root      5081 Feb  3 19:30 manifest.json

Configuration

After installation, add the following properties to hbase-site.xml to enable Phoenix secondary indexes and namespace mapping:

hbase.regionserver.wal.codec
org.apache.hadoop.hbase.regionserver.wal.IndexedWALEditCodec
phoenix.schema.isNamespaceMappingEnabled
true
phoenix.schema.mapSystemTablesToNamespace
true

Command‑Line Usage

After CDH installation, environment variables are set. Use phoenix-sqlline.py to start an interactive session:

[root@hadoop-01 ~]$ phoenix-
phoenix-performance.py  phoenix-psql.py  phoenix-sqlline.py  phoenix-utils.py

Run phoenix-sqlline.py to initialize Phoenix.

Mapping an Existing HBase Table

Assume an HBase table ns1000:operate_log with ~2.8 million rows and 14 columns.

hbase(main):017:0> count 'ns1000:operate_log', INTERVAL => 100000
2799827 row(s) in 173.4200 seconds

Creating a Phoenix Schema and View

0: jdbc:phoenix> create schema if not exists "ns1000";
0: jdbc:phoenix> use "ns1000";
0: jdbc:phoenix> create view "operate_log"(
  pk varchar primary key,
  "f"."appVersion" varchar,
  "f"."city" varchar,
  "f"."lat" varchar,
  "f"."lon" varchar,
  "f"."memberType" varchar,
  "f"."time" varchar);

Query the view to verify data:

0: jdbc:phoenix> select * from "operate_log" limit 1;
+------------------------+-------------+-------+---------------------+---------------------+-------------+----------------+
| pk                     | appVersion  | city  | lat                 | lon                 | memberType  | time           |
+------------------------+-------------+-------+---------------------+---------------------+-------------+----------------+
1538216707720 | 2.22.0 | 重庆市 | 29.577587127685547 | 106.50493621826172 | 0 | 1538216707720 |
+------------------------+-------------+-------+---------------------+---------------------+-------------+----------------+

Creating a Secondary Index on the time Column

0: jdbc:phoenix> create index index_operate_log_time on "operate_log" ("f"."time");
2799827 rows affected (95.814 seconds)

After the index is built, the same time‑range query runs in ~50 ms:

0: jdbc:phoenix> select count(*) from "operate_log" where "f"."time" between '1538216707720' and '1538223834000';
+-----------+
| COUNT(1) |
+-----------+
| 5883     |
+-----------+
1 row selected (0.049 seconds)

Phoenix Index Types

Covered Indexes

A covered index stores selected columns directly in the index table, allowing queries that only need those columns to be satisfied without accessing the base table.

Functional Indexes

Introduced in Phoenix 4.3, functional indexes allow arbitrary expressions (e.g., substr("f"."time",1,10) ) to be indexed, enabling fast queries that use the same expression.

Global Indexes

Global indexes store index data in a separate table; they are suitable for read‑heavy workloads but add network overhead on writes.

Local Indexes

Since Phoenix 4.8.0, local indexes keep index data in a dedicated column family of the same table, reducing write overhead but potentially degrading read performance; they are better for write‑heavy scenarios.

Summary

Phoenix provides a SQL layer on top of HBase, offering standard SQL queries and powerful secondary‑index capabilities. Global indexes favor read‑heavy use cases, while local indexes are better for write‑heavy workloads. Use EXPLAIN to verify whether a query utilizes an index.

References:

HBase integration with Phoenix – view vs. table

How to create secondary indexes in Phoenix on CDH HBase

Cloudera announces Apache Phoenix support in CDH

big dataSQLHBaseApache PhoenixCDHSecondary Index
Big Data Technology Architecture
Written by

Big Data Technology Architecture

Exploring Open Source Big Data and AI Technologies

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.