CTF AI Challenge Solution Report: Analyzing and Improving a Keras Model for Function Entry Detection
This report details the analysis of a JD security CTF challenge that uses a Keras deep‑learning model to detect function entry points in binary code, describing the environment, manual data processing, model inspection, training adjustments, and optimization steps to achieve correct detection.
The article introduces a recent online CTF competition organized by JD Security and the SeeX forum, where participants were asked to solve the "JD AI CTF Challenge" involving a binary sequence and a Keras model.
Problem description: The challenge provides 200 decimal numbers representing a binary sequence; the task is to identify the single position that should be labeled as a function entry (value 1) using a machine‑learning model, but the original model predicts all zeros.
Environment: The analysis was performed on Windows using x32dbg, Anaconda3, Keras (installed via pip), and VSCode.
Manual data analysis: The numbers were observed to be in the range 1‑256, suggesting they had been incremented by one; subtracting one revealed the actual byte values. After adjusting and loading the data into x32dbg, the function entry was located at index 40 (sub esp,0xC).
Model inspection: The provided .h5 file is a Keras model built on Theano with an RNN architecture, outputting a 200×2 probability matrix. Initial predictions showed low confidence for the true entry point (index 40) and high confidence for non‑entries.
Target speculation and improvement: The low confidence stemmed from an imbalanced training set with too many zeros. By augmenting the dataset with additional synthetic samples containing the correct entry pattern (e.g., inserting a function entry at position 168) and retraining, the model’s confidence for the correct entry exceeded 0.5 while other positions remained near zero.
Optimization suggestions: Enrich the synthetic dataset with varied instruction orderings (e.g., swapping add/sub) and reduce over‑reliance on specific opcodes like mov . Also, experiment with different optimizers beyond the default settings.
The second part of the report reflects on combining AI techniques with reverse‑engineering challenges, noting that while the task is not highly complex, the integration offers an interesting learning experience and highlights the risk of over‑fitting when only a single sample is provided.
JD Tech
Official JD technology sharing platform. All the cutting‑edge JD tech, innovative insights, and open‑source solutions you’re looking for, all in one place.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.