Databases 26 min read

Deep Dive into Phoenix Index Creation, Maintenance, and SQL Compilation

This article provides a detailed technical analysis of Phoenix's native index creation and maintenance mechanisms, the underlying source code for index building, the role of coprocessors, and the complete SQL compilation pipeline from parsing to execution, highlighting how hints and optimizers influence index usage.

DataFunTalk

Jun 28, 2019

Deep Dive into Phoenix Index Creation, Maintenance, and SQL Compilation

The article begins by explaining that Phoenix's native indexes do not support full‑text search and that extending the source code is required to add this capability. It walks through the index creation flow, starting with the org.apache.phoenix.compile.CreateIndexCompiler class and the BaseMutationPlan that ultimately executes client.createIndex(create, splits).

Key code snippets illustrate how the index creation plan builds an UPSERT SELECT statement, modifies column types to support null values, adds row‑key columns, and prefixes column names with the column‑family name. The article then summarizes the three main steps of index creation: inserting metadata into Phoenix system tables, generating initial index data, and maintaining the index via coprocessors.

For index maintenance, the org.apache.phoenix.hbase.index.Indexer coprocessor is examined. It intercepts batch Put / Delete operations and delegates them to an IndexBuildManager, which in turn uses a PhoenixIndexBuilder implementation of the IndexBuilder interface to produce index updates.

The article then shifts to the SQL compilation process. It describes the JDBC driver hierarchy ( PhoenixDriver, PhoenixConnection, PhoenixStatement) and shows how PhoenixStatement.executeQuery(String sql) parses the SQL using PhoenixStatementParser, which wraps the ANTLR‑generated PhoenixSQLParser. The parsed statement is transformed into an ExecutableSelectStatement and compiled into a QueryPlan via QueryCompiler.

During compilation, the optimizer processes hints. The HintNode enum includes INDEX, and the QueryOptimizer rewrites the plan using IndexStatementRewriter.translate and IndexExpressionParseNodeRewriter to replace data‑table column references with index columns. This rewriting enables Phoenix to generate a ScanPlan that reads from the index table instead of the base table.

Finally, the article notes that while the analysis focuses on simple SELECT queries, the same mechanisms apply to more complex statements, and extending Phoenix for full‑text search would involve modifying the DDL grammar, the factory methods, and the optimizer’s rewrite logic.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

SQL Compiler Database HBase index Phoenix Coprocessor

Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.