Embedding Python in Java with Jython for Real‑Time Big Data Jobs
This article explains why and how to embed Python code in Java using Jython for real‑time big‑data processing, covering performance benefits, memory‑leak pitfalls, singleton interpreter patterns, function factories, Java‑object conversion, and importing external PyPI packages with practical code examples.
JVM dominates big‑data infrastructure, so Java and Scala are the usual choices. However, when a component needs rapid, configurable changes—such as allowing users to supply custom Python code for a real‑time job—pure Java becomes cumbersome, prompting the use of a dynamic language.
Among dynamic languages, Groovy is designed for the JVM but is not widely known in the data‑engineering community; Python is a more familiar alternative. Several big‑data frameworks already support Python: Hadoop Streaming for MapReduce, PySpark (via Py4J) for Spark, and Jython for Flink.
We selected Jython because it runs inside the same JVM process as the host Java program, eliminating inter‑process data transfer overhead and offering better performance than Hadoop Streaming or Py4J.
Key considerations when using Jython:
Some PyPI packages, especially those with C extensions, cannot run on Jython.
Like Groovy, indiscriminate use of Jython may cause memory leaks.
Jython 2.7.1 (released June 2017) supports all Python 2.7 syntax.
Basic Jython usage
The official Jython documentation provides a simple example for embedding Jython in Java:
import org.python.util.PythonInterpreter;
import org.python.core.*;
public class SimpleEmbedded {
public static void main(String[] args) throws PyException {
PythonInterpreter interp = new PythonInterpreter();
System.out.println("Hello, brave new world");
interp.exec("import sys");
interp.exec("print sys");
interp.set("a", new PyInteger(42));
interp.exec("print a");
interp.exec("x = 2+2");
PyObject x = interp.get("x");
System.out.println("x: " + x);
System.out.println("Goodbye, cruel world");
}
}While straightforward, this example creates a heavy PythonInterpreter instance each time, which is undesirable.
Repeated calls to PythonInterpreter.eval also cause repeated compilation and memory leaks. The solution is to reuse a single PythonInterpreter instance (e.g., as a singleton) and avoid frequent eval calls.
Implementing a singleton is simple: declare a private static final interpreter variable and share it across the application.
public class PythonRunner {
private static final PythonInterpreter intr = new PythonInterpreter();
public PythonRunner(String code) {
// ... render function body, compile, store PyFunction
}
public Object run() {
// ... invoke compiled function
}
}To bypass PythonInterpreter.eval , you must use the interpreter’s exec method to compile a function once and then invoke the resulting PyFunction via its __call__ method.
Jython automatically converts standard Java objects to equivalent Python types (e.g., primitives to int / bool , java.util.Map to dict , java.util.List to list ), allowing seamless use of Java objects in Python code.
Example: a FastJSON JSONObject implements java.util.Map , so it can be treated as a normal Python dict inside Jython.
def func(json):
if not json['test']:
json['test'] = True
return TrueImporting external PyPI packages requires adding their installation directories to Jython’s sys.path . Jython already includes the current working directory, but for packages installed via pip you must locate the site‑packages or dist‑packages paths and insert them:
public class PythonRunner {
private static final PythonInterpreter intr = new PythonInterpreter();
static {
intr.exec("import sys");
try {
Process p = Runtime.getRuntime().exec(new String[]{"python2", "-c", "import json; import sys; print json.dumps(sys.path)"});
p.waitFor();
String stdout = IOUtils.toString(p.getInputStream());
JSONArray syspathRaw = JSONArray.parseArray(stdout);
for (int i = 0; i < syspathRaw.size(); i++) {
String path = syspathRaw.getString(i);
if (path.contains("site-packages") || path.contains("dist-packages"))
intr.exec(String.format("sys.path.insert(0, '%s')", path));
}
} catch (Exception ex) {}
}
// ...
}Not all PyPI packages work under Jython, especially those with C extensions; always test imports in a Jython shell first.
Conclusion
This blog post summarizes our recent work embedding Jython in Java for configurable, hot‑reloaded real‑time jobs, shares practical patterns, and warns about limitations such as potential bugs and incomplete documentation. If you have similar needs, give Jython a try.
NetEase Game Operations Platform
The NetEase Game Automated Operations Platform delivers stable services for thousands of NetEase titles, focusing on efficient ops workflows, intelligent monitoring, and virtualization.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.