Big Data 9 min read

Apache InLong SPI Refactoring: Reducing Maintenance Costs and Boosting Extensibility

This article explains how Apache InLong's manager service applied SPI‑based refactoring to simplify code, lower maintenance overhead, and dramatically improve extensibility for a rapidly growing variety of data sources and sinks in large‑scale data integration scenarios.

DataFunSummit
DataFunSummit
DataFunSummit
Apache InLong SPI Refactoring: Reducing Maintenance Costs and Boosting Extensibility

During the migration of InLong to the cloud, the rapid increase in data source and sink types caused high maintenance costs, duplicated code, and difficulty extending the system. By applying a Service Provider Interface (SPI) extension in InLong Manager, the team reduced maintenance effort, increased code reuse, and greatly enhanced extensibility.

1. Project Overview Apache InLong is a one‑stop massive data integration framework that provides automatic, secure, reliable, and high‑performance data transmission, supporting stream‑based analytics, modeling, and applications. It graduated as an Apache Top‑Level Project in June 2022.

2. InLong Manager InLong Manager offers complete data service governance, including metadata, task flow, permissions, and OpenAPI. It manages metadata for tasks, clusters, and schemas, and orchestrates the full pipeline from ingestion to storage.

3. What is SPI? SPI (Service Provider Interface) is a Java mechanism that allows third‑party implementations to extend or replace components, enabling dynamic loading of implementations via configuration files in META-INF/services and java.util.ServiceLoader .

4. SPI Refactoring Process The refactoring removed extensive if‑else/switch‑case logic, unified service interfaces, and introduced a generic sink configuration model where type‑specific parameters are stored as JSON in an extensible field. Key code paths include org.apache.inlong.manager.service.sink.StreamSinkServiceImpl and org.apache.inlong.manager.service.sink.SinkOperatorFactory .

5. Benefits After Refactoring The changes resulted in (i) higher code reuse and lower duplication, (ii) significantly improved extensibility—new sink types can be added without modifying existing interfaces, (iii) reduced need for frequent DDL changes, and (iv) the ability to extend internal configurations without altering upstream open‑source code.

6. Q&A Highlights The article also covers common questions about InLong's application scenarios, the use of FlinkCDC on the source side, how sink connectors like Hudi or Doris can be added via SPI, and the current lack of automatic DDL synchronization for databases.

big dataSPIdata integrationextensibilityApache InLong
DataFunSummit
Written by

DataFunSummit

Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.