Automating Code Refactoring with Python: Implementing Data Path Integration
This article shows how Python scripts can automatically generate the Objective‑C code needed to register, read, and update hundreds of XXXSetting data items within a unified Data Path framework, eliminating manual refactoring errors, cutting development effort, and streamlining cross‑module data integration.
The process of software development often encounters issues such as complex logic structures, messy dependencies, redundant code, and hard-to-understand naming conventions. These problems can degrade code maintainability, increase maintenance costs, and affect development efficiency. Refactoring is typically used to improve and optimize existing code structures. Most of the work in refactoring is done manually, which is a time-consuming and error-prone process. For developers, ensuring quality and efficiency in refactoring existing functionality without changing the software's functionality and behavior is a significant challenge. This series uses Python to implement automated tools that support the refactoring process.
In the previous article, Through Python Scripts Supporting OC Code Refactoring Practice (Part 1): Module Call Relationship Analysis , the focus was on using Python to analyze module call relationships and output them in .csv format, which can be imported into Excel to evaluate the impact of refactoring and the refactoring method for each data item. This helps in making informed decisions about resource allocation, especially when team resources are scarce, and clarifies what needs to be done.
Additionally, the previous article mentioned implementing a mechanism for data item communication between modules (referred to as Data Path in this series), which supports data items being read and written by other components' modules without being publicly exposed. Based on the implementation of the Data Path , integrating the XXXSetting module into the Data Path can solve the problem of interface incompatibility due to changes in data items in the XXXSetting module. This also reduces the number of secondary releases for the upper-layer dependent components, indirectly improving the development efficiency of requirements related to data items in the XXXSetting module.
During the process of integrating the XXXSetting module's data items into the Data Path , hundreds of data items need to be refactored according to the Data Path standard. Manual refactoring is costly and has a high error rate. Verification during testing requires item-by-item validation, which is also costly. We used Python scripts to generate the code for integrating the Data Path in this part, which can precisely generate the code segment for each data item's integration into the Data Path , achieving zero bugs in the testing and deployment phases.
This article first briefly introduces the basic functions of the Data Path , and then discusses how to use Python-written automated tools to integrate the XXXSetting module, which provides data items, into the Data Path , and the implementation idea of automatic code generation.
Data Path Technology Implementation and Integration
Based on the goals of this configuration data item refactoring work and the reuse of the Data Path , the goal of implementing the Data Path is to support the integration of different modules, as shown in Figure 1. There are two types of modules related to the Data Path .
Figure 1
Data items are divided into two main categories based on supply and demand: data item provider modules and data item usage modules.
1. Data Item Provider Module : The data provider (e.g., the XXXSetting mentioned in this article) adheres to the system's agreed-upon data read/write protocol to provide mutually accessible data item support. The relationship between data items and data item provider modules is n:1. The Data Path supports the integration of multiple data item provider modules, with a 1:n relationship.
2. Data Item Usage Module : The data user (e.g., modules in XXXLib from the previous article) uses the capabilities provided by the Data Path to read and write data, obtaining and updating the values of the data items they depend on. The relationship between the Data Path and data item usage modules is 1:n.
The main implementation idea of the Data Path is to provide a unified interface that supports the integration of different data item provider modules . In the Data Path , it manages the integrated data item provider modules . When the data item usage module needs to read or write data, it uses the interface provided by the Data Path to synchronize the data items. It mainly consists of the data item provider module interface layer, data item provider module management, and data item read/write service module .
1. Data Item Provider Module Interface Layer : Defines the data read/write capabilities that the data provider must implement. Only modules implemented according to this standard can be integrated as data provider modules.
2. Data Item Provider Module Management : Manages all data item provider modules in the system, providing a registration interface. Data item provider modules can call the registration to integrate the data items they need to manage in the Data Path . It also distributes the read/write operations of the associated data items in the module when receiving a data item read/write request.
3. Data Item Read/Write Service Module : Provides stable data read/write capabilities that are globally accessible. It finds the data item provider module based on the key and calls the interface of the data item provider module to implement the read/write of the data item.
Data Provider Integration into Data Path Implementation
The main work for integrating data items into the Data Path involves two steps:
Register the information of data items that can be read and written, which is an array storing the key of each data item. The key naming rule is data item provider module class name_data item name . This part of the code is automatically generated using a Python script.
Data item read/write is implemented by the data item provider module through the read/write interface, matching the data item based on the key, and then performing read/write operations on the data item. This part of the code is also automatically generated using a Python script.
The data item usage module changes from direct calling to indirect calling through the Data Path . The detailed implementation will be introduced in the next article. Stay tuned.
Data Items to be Refactored
In the previous article, Through Python Scripts Supporting OC Code Refactoring Practice (Part 1): Module Call Relationship Analysis , in the section 3.1.2 Extracting Variable Types and Names , after preprocessing, all data item types and names can be extracted.
Combined with the section 3.3.2 Data Item Pre-analysis Statistics Output in the previous article, data items used by multiple components are selected to determine the data items that need to be refactored this time.
By calculating the intersection of these two parts of data, the complete set of data item types and names that need to be refactored is obtained, which is used for generating data item read/write code. The following is an example of the data set.
// Data item type Data item name;
NSString
value1;
NSString
value2;
BOOL
value3;
...Data Item List Generation for Data Provider Module
The Data Path itself does not generate data but acts as a bridge for data read/write. When the data provider module integrates into the Data Path , it needs to know which data items the Data Path supports for read/write.
The specific implementation is to notify the Data Path through the data item provider module interface layer agreement, and the Data Path calls it to return the list of data items supported by the data provider module . The data structure of the data item list is an array, with each data item's key in the array. The key generation format is data provider module class name_data item name . The following is the conversion code implemented in Python:
# Example of the original code line NSString value1; refer to section 2.2 of the code
matchObj = re.match(
r"(.*)\s+(.*);"
, line, re.M|re.I)
if
matchObj:
# value = matchObj.group(2) -- value1
key =
' @"'
+ className +
'_'
+ matchObj.group(
2
) +
'",\n'
# key = ' @"className_value1",\n'
# key is written in OC format, one key per line, initialized in NSArray style for multiple keysData Item Read Code Generation
After registering the data items that can be read and written through the Data Path , when the Data Path needs to read/write the data item, the data provider implements the read/write of the data item according to the standard.
2.4.1 Data Item Read Code Example
The Data Path supports the reading of basic data types, with different read interfaces for each data type. The data provider implements the reading of different types of data based on the data item type. Within the same data type, the data provider returns the corresponding data item value based on the key. The target generated OC code is as follows:
// Data item is NSString type
- (
NSString
*)stringForKey:(
NSString
*)key {
if
([key isEqual:
@"className_value1"
]) {
return
self
.value1;
}
// If there are multiple data items, they are also automatically merged into the same function
if
([key isEqual:
@"className_value2"
]) {
return
self
.value2;
}
return
nil
;
}
// Data item is BOOL type
- (
BOOL
)boolForKey:(
NSString
*)key {
if
([key isEqual:
@"className_value3"
]) {
return
self
.value3;
}
return
NO
;
}
// Others...2.4.2 Data Item Read Implementation Generation
Due to the different types of data items, different interfaces need to be used for reading, so during code conversion, the converted code lines are stored in different data variables based on the data item type. Each data variable will add a function header during initialization and add a function tail after the conversion is completed.
Function Header Example , using NSString type data as an example
# NSString type data read interface, function header string saved by variable
funName
=
'- (NSString *)stringForKey:(NSString *)key {'Function Body Example , each data item generates corresponding code, sequentially storing the read operation of each data item
# Example of the original code line NSString value1; refer to section 2.2 of the code
matchObj = re.match(
r"(.*)\s+(.*);"
, line, re.M|re.I)
if
matchObj:
funbody =
' if ([key isEqual:@"'
funbody +=
'className_'
+ matchObj.group(
2
) +
'"]) {\n'
funbody +=
' return self.'
+ matchObj.group(
2
) +
';\n'
funbody +=
' }\n\n'
# funbody is the part of the code for reading a certain data item after conversion, matching the key, and then returning the corresponding value, adding some spaces and line breaks, code aligned according to the specification
# if ([key isEqual:@"className_value1"]) {
# return self.value1;
# }Function Tail Example , using NSString type as an example, after all data items are converted, add
funEnd = ' return nil;\n'
funEnd += '}\n\n'Different data types are converted sequentially, and after all data items are converted, they are combined into one file. The content of the file can be directly copied into the project and used.
Data Item Update
2.5.1 Data Item Update Code Example
The Data Path supports the updating of basic data types, with different update interfaces for each data type. The data provider implements the updating of different types of data based on the data item type. Within the same data type, the data provider updates the corresponding data item value based on the key. The target generated OC code is as follows:
// Data item is NSString type
- (
void
)updateString:(
NSString
*)value forKey:(
NSString
*)key {
if
([key isEqual:
@"className_value1"
]) {
self
.value1 = value;
return
;
}
// If there are multiple data items, they are also automatically merged into the same function
if
([key isEqual:
@"className_value2"
]) {
self
.value2 = value;
return
;
}
}
// Data item is BOOL type
- (
void
)updateBool:(
BOOL
)value forKey:(
NSString
*)key {
if
([key isEqual:
@"className_value3"
]) {
self
.value3 = value;
return
;
}
}
// Others...2.5.2 Data Item Update Implementation Generation
Due to the different types of data items, different interfaces need to be used to update the data items, so during code conversion, the converted code lines are stored in different data variables based on the data item type. Each data variable will add a function header during initialization and add a function tail after the conversion is completed.
Function Header Example
# NSString type data read interface, function header string saved by variable
funName
=
'- (void)updateString:(NSString *)value forKey:(NSString *)key {'Function Body Example , each data item generates corresponding code, sequentially storing the read operation of each data item
# Example of the original code line NSString value1; refer to section 2.2 of the code
matchObj = re.match(
r"(.*)\s+(.*);"
, line, re.M|re.I)
if
matchObj:
funbody =
' if ([key isEqual:@"'
funbody +=
'className_'
+ matchObj.group(
2
) +
'"]) {\n'
funbody +=
' self.'
+ matchObj.group(
2
) +
' = value;\n'
funbody +=
' return;\n'
funbody +=
' }\n\n'
# funbody is the part of the code for updating a certain data item after conversion, matching the key, and then returning the corresponding value, adding some spaces and line breaks, code aligned according to the specification
# if ([key isEqual:@"className_value1"]) {
# self.value1 = value;
# return;
# }Function Tail Example , after all data items are converted, add
funEnd += '}\n\n'Conclusion
This article, based on the analysis conclusions of the previous article, integrates the data items used by multiple components into the Data Path code, using the practice of automatic generation with Python scripts.
Due to the large number of data items involved, it is necessary to select the data items that need to be refactored from all data items, generate a data item key list, and integrate it into different read/write interfaces according to the data item type. It is very difficult to ensure the quality of the data item integration process into the Data Path using manual code writing, and it is also very difficult to verify the completeness of data item migration.
Using Python scripts to implement tools that support data item integration into the Data Path code generation can automatically and precisely generate the code for each data item's integration into the Data Path , reducing the manpower investment of R&D and testing, indirectly improving development efficiency.
In the next article, we will introduce how to adapt the data item usage module to integrate into the Data Path through Python scripts. Stay tuned.
Baidu Geek Talk
Follow us to discover more Baidu tech insights.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.