Artificial Intelligence 14 min read

Implementing Bilinear Interpolation on FPGA for Neural Network Acceleration

The article explains the principles of bilinear interpolation, why it is needed for smooth image scaling in neural‑network layers such as Interp and Resize, and details FPGA‑specific optimizations—including lookup‑table based coefficient pre‑computation, two‑line BRAM caching, and index‑driven data swapping—to reduce DSP usage and improve throughput.

Qunar Tech Salon
Qunar Tech Salon
Qunar Tech Salon
Implementing Bilinear Interpolation on FPGA for Neural Network Acceleration

What? What is Bilinear Interpolation?

Bilinear interpolation extends linear interpolation to two dimensions by using the four nearest pixel values around a target point; it produces smoother image scaling compared with nearest‑neighbor methods.

Why? Why is Bilinear Interpolation Needed?

Nearest‑neighbor scaling is fast but creates noticeable blocky artifacts. Bilinear interpolation computes each output pixel as a weighted sum of four surrounding source pixels, yielding smoother enlargements and better visual quality for layers like Interp in YOLO‑v2/v3 and Faster‑R‑CNN.

How? How to Implement Bilinear Interpolation on FPGA?

The core algorithm multiplies each of the four source pixel values by a weight derived from the fractional distances of the target point and sums the results. Two key steps are pixel‑selection and weight calculation.

Pixel selection depends on the input‑output resolution ratio, expressed as a base coefficient. Because the interpolation point lies at the center of a pixel, the ratio uses (size‑1) rather than size, e.g., for scaling 2×2 to 4×4 the coefficient is (2‑1)/(4‑1)=0.33.

Weight calculation uses the base coefficient multiplied by the integer part (index) and the fractional part (weight) for each dimension, e.g., for a target point (1,1) the weight formula is 1·(1‑0.67)·(1‑0.67) + 2·0.67·(1‑0.67) + 4·(1‑0.67)·0.67 + 5·0.67·0.67.

FPGA‑Specific Optimizations

Upgrade 1 – Lookup‑Table (LUT) to Reduce Computation By pre‑computing the base coefficient and per‑pixel parameters outside the FPGA, division operations are eliminated and only simple multiplications remain. These parameters are stored in BRAM and fetched during runtime.

Upgrade 2 – Two‑Line BRAM Cache to Reduce Access Latency A dual‑port BRAM holds one row of the source feature map; a second BRAM stores the next row. This allows a 2×2 window to be assembled in a single cycle, avoiding the three‑quarter idle time of a naïve design.

Upgrade 3 – Index‑Driven Data Swapping for Variable Resolutions Row and column indices generate address and swap signals. When the row index changes, a new line is loaded into BRAM; when the column index changes, the window slides horizontally. This scheme supports arbitrary scaling factors without fixed‑step counters.

These techniques together minimize DSP usage, reduce memory bandwidth, and enable high‑throughput interpolation on resource‑constrained FPGA devices, making non‑convolutional layers such as Interp and ROI‑Align viable for accelerating detection networks like Faster‑R‑CNN.

Conclusion

1. Pre‑compute resolution‑dependent coefficients and store them in BRAM to avoid runtime division. 2. Use a two‑row BRAM cache with latch registers to fetch a 2×2 window in one cycle. 3. Drive data movement with index‑based signals, allowing flexible resolution support and efficient FPGA resource utilization.

The presented FPGA implementation demonstrates that even GPU‑unfriendly operators can be accelerated effectively, opening the door for high‑accuracy detection models to run efficiently on FPGA platforms.

neural networksDSPHardware AccelerationFPGABilinear InterpolationBRAMLookup Table
Qunar Tech Salon
Written by

Qunar Tech Salon

Qunar Tech Salon is a learning and exchange platform for Qunar engineers and industry peers. We share cutting-edge technology trends and topics, providing a free platform for mid-to-senior technical professionals to exchange and learn.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.