Developing Custom Presto SQL Functions (UDF) with Java Plugins
This tutorial explains how to create, register, and deploy custom scalar, aggregation, and window functions for the Presto distributed query engine using Java annotations, the Presto plugin mechanism, and code examples that illustrate UDF development, plugin packaging, and state handling for aggregation functions.
Presto is an open‑source distributed query engine used at Liulishuo for interactive queries. To extend its SQL capabilities, developers can write custom functions (UDFs) similar to Hive UDFs, covering scalar, aggregation, and window types.
Getting Started with a Scalar Function
A scalar function is a static Java method annotated with @ScalarFunction . The method must return a Slice and use Slices.utf8Slice for string serialization. Example:
/**
* @author haitao.yao
*/
public class LiulishuoFunctions {
public static final String DATE_FORMAT = "yyyy-MM-dd";
@ScalarFunction
@Description("hive to_date function")
@SqlType(StandardTypes.VARCHAR)
public static Slice to_date(@SqlType(StandardTypes.TIMESTAMP) long input) {
final DateFormat format = new SimpleDateFormat(DATE_FORMAT);
return Slices.utf8Slice(format.format(new Date(input)));
}
}Key steps:
Define a Java class and mark the static method with @ScalarFunction .
Provide a description using @Description , which appears in SHOW FUNCTIONS .
Specify the return type with @SqlType (e.g., StandardTypes.VARCHAR ).
Return a Slice using Slices.utf8Slice .
Presto Plugin Mechanism
Presto loads custom functions via its plugin system. Implement a class that implements the Plugin interface and register it in src/main/resources/META-INF/services/com.facebook.presto.spi.Plugin . Example plugin class:
/**
* @author haitao.yao
*/
public class LiulishuoFunctionsPlugin implements Plugin {
@Override
public
List
getServices(Class
type) {
return ImmutableList.of();
}
}Package the compiled classes and the META-INF/services file into a JAR, place it under ${PRESTO_HOME}/plugin/your-plugin-name/ , and restart Presto. The startup log will confirm the plugin is loaded.
Registering Custom Functions
When the plugin’s getServices method receives FunctionFactory.class , return a factory that supplies the functions:
/**
* @author haitao.yao
*/
public class LiulishuoFunctionsPlugin implements Plugin {
@Override
public
List
getServices(Class
type) {
if (type == FunctionFactory.class) {
final FunctionListBuilder builder = new FunctionListBuilder();
builder.scalars(LiulishuoFunctions.class)
.aggregate(LiulishuoDemoAggregationFuction1.class);
FunctionFactory factory = () -> builder.getFunctions();
return ImmutableList.of(type.cast(factory));
} else {
return ImmutableList.of();
}
}
}The FunctionListBuilder aggregates all scalar, aggregation, and window functions defined in the project.
Developing an Aggregation Function
Aggregation functions maintain state across rows and nodes. They consist of three methods annotated with @InputFunction , @CombineFunction , and @OutputFunction . Example skeleton:
@AggregationFunction("your_function_name")
@Description("这里是函数的介绍")
public final class LiulishuoDemoAggregationFunction {
@InputFunction
public static void input(SliceState state, @SqlType(StandardTypes.VARCHAR) Slice input) {
// compute and update state
state.setSlice(Slices.utf8Slice(updatedValue));
}
@CombineFunction
public static void combine(SliceState state, SliceState otherState) {
// merge states
state.setSlice(Slices.utf8Slice(updatedValue));
}
@OutputFunction(OUTPUT_TYPE)
public static void output(SliceState state, BlockBuilder out) {
out.writeObject(blockBuilder.build());
out.closeEntry();
}
}State must implement AccumulatorState and be annotated with @AccumulatorStateMetadata . For complex state, a common shortcut is to serialize the state to a JSON string and store it in a SliceState using Slices.utf8Slice :
@InputFunction
public static void input(SliceState state, @SqlType(StandardTypes.VARCHAR) Slice input) {
final ObjectMapper objectMapper = new ObjectMapper();
final String updatedValue = objectMapper.writeValueAsString(input.toStringUtf8());
state.setSlice(Slices.utf8Slice(updatedValue));
}Although JSON serialization adds overhead, it greatly simplifies development when performance requirements are modest.
Summary
At Liulishuo we have built numerous Presto UDFs to improve SQL development efficiency, ranging from IP‑lookup functions to RPC‑based data transformations. Because official documentation is scarce, this guide aims to help developers create, package, and register both scalar and aggregation functions in Presto.
Liulishuo Tech Team
Help everyone become a global citizen!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.