Practical Guide to Querying HBase with Python happybase and JPype
This tutorial walks through setting up the Python happybase library, installing JPype for Java integration, and demonstrates end‑to‑end code examples for connecting to an HBase Thrift server, generating row keys via Java utilities, querying data, and handling type conversions.
1. Introduction
Python can interact with HBase through the Thrift protocol, which requires the HBase ThriftServer to be running. The happybase library provides a Pythonic wrapper around Thrift, while jpype enables calling Java code (e.g., custom row‑key generators) from Python.
2. Environment Preparation
2.1 Install happybase
First verify whether happybase is installed. Install it via pip, which also pulls the thriftpy2 dependency.
# pip install happybaseFor offline environments, download the thriftpy2 and happybase source packages and install them manually:
# pip install thriftpy2-0.4.8.tar.gz
# pip install happybase-1.2.0.tar.gz2.2 Install jpype
JPype depends on numpy , so install numpy first, then install the JPype wheel or source package.
# pip install numpy
# pip install JPype1-0.7.0.tar.gzOn Windows, ensure the appropriate .whl file matches the Python architecture (32‑bit or 64‑bit) and that the wheel package is available.
3. Practice
3.1 Using happybase to query data
Create a connection to the Thrift server (replace 'thriftserver_ip' with the actual address):
connection = happybase.Connection('thriftserver_ip', 9090, table_prefix=b'ns1', table_prefix_separator=b':')Obtain a table object and fetch a row by its key:
table = connection.table('tablename')
# row_key = b'\x01\x91!\x02\x00\x00\x00\x04007720181210'
row = table.row(row_key)
if len(row) != 0:
print(row[b'f:column1']) # bytes output
print(row.get(b'f:column1', b'').decode()) # string outputClose the connection when finished:
connection.close()happybase also supports scan , put , delete , and batch operations; refer to the official documentation for details.
3.2 Invoking Java classes for row‑key generation
Prepare the required JAR files and start the JVM:
jars = ["/app/lib/custom-1.2.0.jar", "/app/lib/commons-codec-1.9.jar"]
jvm_classpath = "-Djava.class.path={}".format(":".join(jars))
if not jpype.isJVMStarted():
jpype.startJVM(jpype.getDefaultJVMPath(), "-ea", jvm_classpath)Import the Java utility classes and generate the row key:
MD5Util = JClass("com.example.MD5Util")
BytesUtil = JClass("com.example.BytesUtil")
rowkey_bs = MD5Util.getHashBytes(BytesUtil.toBytes(pk_id))
Hex = JClass("org.apache.commons.codec.binary.Hex")
row_key = bytes.fromhex(Hex.encodeHexString(rowkey_bs))This converts the Java byte[] to a Python bytes object, completing the string → byte[] → bytes transformation required for HBase queries.
4. Conclusion
The guide demonstrates how to install and configure happybase and jpype , start the JVM, invoke Java utilities for row‑key creation, and perform basic HBase operations from Python. The steps cover both online and offline installation scenarios and highlight important type‑conversion details.
Big Data Technology Architecture
Exploring Open Source Big Data and AI Technologies
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.