SPI Refactoring Practice in Apache InLong Manager to Reduce Maintenance Cost and Enhance Extensibility
This article presents the SPI-based refactoring of Apache InLong Manager, describing the project's background, existing maintenance challenges, the concept of Java Service Provider Interface, the concrete implementation steps, code restructuring, and the resulting benefits such as higher code reuse, easier extension, and reduced DDL changes.
Apache InLong is a one‑stop massive data integration framework that provides automatic, secure, reliable, and high‑performance data transmission, originally contributed by Tencent in 2019 and graduated as an Apache Top‑Level Project in 2022.
InLong Manager, the control service of InLong, offers full data service governance capabilities, including metadata, task flow, permissions, and OpenAPI.
During the migration of InLong to the cloud, the rapid increase of data source and sink types caused high maintenance costs, duplicated code, and difficulty extending the system.
The article identifies three main pain points: high maintenance cost due to many tables with repeated fields, abundant similar if‑else/switch‑case logic, and poor extensibility that violates the open‑closed principle.
It introduces the Java Service Provider Interface (SPI) as a mechanism for third‑party extensions, explaining its purpose and providing common examples such as database driver loading, SLF4J logging implementations, and Spring type conversion.
The SPI implementation process is illustrated with the Flink JDBC Connector example: defining an open interface (JdbcDialectFactory), creating a META‑INF/services file listing implementations, and using java.util.ServiceLoader to load the appropriate implementation at runtime.
The refactoring of InLong Manager’s service layer removes cumbersome if‑else/switch‑case statements, consolidates service interfaces, and introduces a unified sink operator factory (e.g., org.apache.inlong.manager.service.sink.StreamSinkServiceImpl and org.apache.inlong.manager.service.sink.SinkOperatorFactory ).
The database entity model is also redesigned so that a single table can store configurations for any sink type, using a generic field for common attributes and an ext_params JSON field for type‑specific parameters.
Key code paths such as org.apache.inlong.manager.service.sink.AbstractSinkOperator , org.apache.inlong.manager.service.sink.StreamSinkOperator , and related operators are listed for further reference.
After the SPI‑based transformation, the benefits include improved code reuse, dramatically enhanced extensibility, reduced need for frequent DDL changes, and the ability to add new internal configurations without modifying existing interfaces.
The article concludes with a Q&A session covering InLong’s application scenarios, the use of FlinkCDC on the source side, SPI‑based connector extensions for sinks, and future plans for DDL synchronization.
DataFunTalk
Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.