Backend Development 19 min read

Design and Evaluation of a JSON Similarity Algorithm for Reducing Diff Noise in Traffic Replay

This article presents a systematic approach to distinguish effective from ineffective diff failures in traffic replay by designing a JSON similarity model based on value, key, and structural comparisons, implementing the algorithm in Java, and demonstrating its superior accuracy over traditional system diff through extensive experiments.

转转QA

Feb 21, 2024

Design and Evaluation of a JSON Similarity Algorithm for Reducing Diff Noise in Traffic Replay

Business Background

During traffic replay, a large number of diff failures were observed, many of which were not triggered by genuine business scenarios, causing significant noise and making it difficult to judge result correctness.

Thoughts

1. Distinguishing Effective and Ineffective Results

Effective results are business‑service‑related exceptions, while ineffective results are non‑business exceptions. The article lists common exception types and classifies them into business and non‑business categories.

2. Judging Abnormal Results

Manual analysis is inefficient; the need for an automated, reliable method is highlighted.

3. What to Compare

The comparison should consider result status, value differences, key differences, and response text differences.

Exploring Similarity Models

Reference Algorithms

Jaccard similarity coefficient and distance, Levenshtein (edit) distance, and cosine similarity are introduced as potential metrics for comparing two JSON objects.

Design Ideas

Three rule‑based dimensions are proposed:

Value similarity : calculate the percentage of differing values.

Key similarity : calculate the percentage of matching non‑empty keys.

Structure similarity : assess JSON structural integrity.

Process Design

The final similarity formula is:

(valueSimilarity * 0.2 + keySimilarity * 0.4 + structureSimilarity * 0.4) * 100%

Why these weights? 0.2 for value similarity because recorded data may vary widely across environments. 0.4 for key similarity to capture interface changes. 0.4 for structure similarity to ensure structural consistency.

Similarity Algorithm Experiment

Test Data

Two JSON strings are used as examples. The following Java code parses them into JSONObject objects and performs the comparison.

String jsonString1 = "{\"result\":{...}}";
String jsonString2 = "{\"result\":{...}}";

public boolean resultOfContrast(String recordResponse, String replayResponse){
    JSONObject json1 = new JSONObject(recordResponse);
    JSONObject json2 = new JSONObject(replayResponse);
    // ...
}

1. Value Difference Check

public static BigDecimal compareResultValues(JSONObject json1, JSONObject json2){
    Map<String, Object> allValues1 = findAllValues(json1);
    Map<String, Object> allValues2 = findAllValues(json2);
    int totalKeys = allValues1.size();
    int matchedKeys = 0;
    for (Map.Entry<String, Object> entry : allValues1.entrySet()){
        if (allValues2.containsKey(entry.getKey())){
            Object v1 = entry.getValue();
            Object v2 = allValues2.get(entry.getKey());
            if (v1 != null && v2 != null && isValueEqual(v1, v2)){
                matchedKeys++;
            }
        }
    }
    double similarity = (double) matchedKeys / totalKeys;
    return BigDecimal.valueOf(similarity).setScale(4, RoundingMode.HALF_UP);
}

2. Key Difference Check

public static BigDecimal compareNonEmptyKeys(JSONObject json1, JSONObject json2){
    Map<String, List<String>> nonEmptyKeyPaths1 = findNonEmptyKeyPaths(json1);
    Map<String, List<String>> nonEmptyKeyPaths2 = findNonEmptyKeyPaths(json2);
    Set<String> commonKeys = nonEmptyKeyPaths1.keySet().stream()
        .filter(nonEmptyKeyPaths2::containsKey)
        .collect(Collectors.toSet());
    int total = nonEmptyKeyPaths1.size();
    int matched = (int) commonKeys.stream()
        .filter(k -> nonEmptyKeyPaths1.get(k).equals(nonEmptyKeyPaths2.get(k)))
        .count();
    double similarity = (double) matched / total;
    return BigDecimal.valueOf(similarity).setScale(4, RoundingMode.HALF_UP);
}

3. Structure Difference Check

public static BigDecimal compareStructuralIntegrity(JSONObject json1, JSONObject json2){
    Map<String, List<String>> structure1 = flattenJsonStructure(json1);
    Map<String, List<String>> structure2 = flattenJsonStructure(json2);
    int forwardTotal = structure1.size();
    int forwardCount = 0;
    for (String key : structure1.keySet()) if (structure2.containsKey(key)) forwardCount++;
    int reverseTotal = structure2.size();
    int reverseCount = 0;
    for (String key : structure2.keySet()) if (structure1.containsKey(key)) reverseCount++;
    double forwardPct = (double) forwardCount / forwardTotal;
    double reversePct = (double) reverseCount / reverseTotal;
    double smaller = Math.min(forwardPct, reversePct);
    return BigDecimal.valueOf(smaller).setScale(4, RoundingMode.HALF_UP);
}

Test Results

The similarity algorithm reduced diff failures from 328 to 218, correctly identifying 110 cases as successful and achieving an accuracy of 86.89% compared to the system diff’s 53.35%.

Result Noise Reduction

A comparison table shows the improvement in correct diff identification when using the similarity algorithm versus the raw system diff and manual judgment.

Summary

1. Validation Conclusions

The algorithm successfully implements a similarity‑based diff comparison, accurately distinguishing effective from ineffective failures and improving overall success rates.

2. Advantages and Limitations

Advantages

Multi‑dimensional comparison provides detailed analysis.

Accuracy exceeds system diff by over 30%.

Configurable weighting offers flexibility.

Limitations

Currently only evaluates value, key, and structure; it cannot infer business context.

Performance may degrade on large‑scale data sets.

Future Work

Plans include extending the algorithm to more scenarios, optimizing performance for large data volumes, and incorporating business‑scenario awareness to further improve accuracy.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Java traffic replay JSON Backend testing Diff Noise Reduction Similarity Algorithm

Written by

转转QA

In the era of knowledge sharing, discover 转转QA from a new perspective.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.