Mobile Development 15 min read

Optimizing Mobile Barcode Scanning Performance: From ZXing Tuning to Deep Learning‑Based Barcode Region Detection

By profiling the Youzan app’s ZXing pipeline, eliminating costly image rotation and format conversions, restricting decoding to the two most common 1‑D types, and adding a lightweight deep‑learning barcode‑region detector, scan latency fell from 4.1 s to 1.5 s and success rose from 91 % to 97 %.

Youzan Coder

Nov 23, 2021

Optimizing Mobile Barcode Scanning Performance: From ZXing Tuning to Deep Learning‑Based Barcode Region Detection

Background

Barcode scanning via the camera is widely used in mobile scenarios such as payment codes, restaurant ordering, QR code subscription, parcel tracking, and product barcodes. Poor scanning speed and low success rates directly affect conversion and user satisfaction, making scanning performance a key metric for many mobile apps.

Data collected from the Youzan retail app showed an average scanning time of 4.1 s (camera open to successful decode) with a 91 % success rate. The per‑frame processing time was 516 ms, meaning only 2 frames per second were actually processed.

Analysis

The generic scanning page supports both 1‑D and 2‑D codes. A detailed breakdown of the processing pipeline (5 steps) revealed the following time distribution:

T1 – Format conversion (CVPixelBufferRef → CGImageRef): 30 ms (6 %).

T2 – Image rotation to portrait: 130 ms (25 %).

T3 – Cropping to the scanning frame: 0 ms (0 %).

T4 – Conversion to grayscale (ZXLuminanceSource): 58 ms (11 %).

T5 – Decoding: 298 ms (58 %).

Strategy 1: Reduce Decoding Types

Statistical analysis showed that the top two 1‑D barcode types (EAN‑13 and Code‑128) account for 90 % of usage, while QR codes only represent 3 %. By configuring ZXing to decode only these high‑frequency types, the decoding time dropped from 298 ms (all types) to 16 ms (top‑2 types).

Strategy 2: Optimize/Remove Rotation

Rotating the full‑resolution image (1920×1080) costs 130 ms. By first cropping to the scanning window (840×636) and then rotating, the cost falls to 36 ms. Better yet, using the camera API’s videoOrientation to output portrait images eliminates rotation entirely (0 ms).

Strategy 3: Merge/Eliminate Format Conversions

A custom luminance source was implemented to convert CVPixelBufferRef directly to the format required by ZXing, bypassing intermediate CGImage steps. The key implementation is shown below:

@implementation YZLuminanceSource
- (instancetype)initWithBuffer:(CVPixelBufferRef)buffer left:(size_t)left top:(size_t)top width:(size_t)width height:(size_t)height {
    self = [super initWithWidth:(int)width height:(int)height];
    if (self) {
        size_t selfWidth = self.width;
        size_t selfHeight = self.height;
        size_t offsetX = (int)left;
        size_t offsetY = (int)top;
        size_t bytesPerRow = CVPixelBufferGetBytesPerRow(buffer);
        size_t dataWidth = CVPixelBufferGetWidth(buffer);
        size_t dataHeight = CVPixelBufferGetHeight(buffer);
        NSAssert((offsetX + selfWidth <= dataWidth && offsetY + selfHeight <= dataHeight), @"Crop rectangle does not fit within image data.");
        _data = (int8_t *)malloc(selfWidth * selfHeight * sizeof(int8_t));
        CVPixelBufferLockBaseAddress(buffer,0);
        int8_t *baseAddress = (int8_t *)CVPixelBufferGetBaseAddress(buffer);
        CVPixelBufferUnlockBaseAddress(buffer, 0);
        dispatch_apply(selfHeight, dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_LOW, 0), ^(size_t j) {
            for (size_t i = 0; i < selfWidth; i++) {
                size_t baseOffset = (j+offsetY)*bytesPerRow + (i+offsetX)*4;
                uint32_t blue = baseAddress[baseOffset] & 0xFF;
                uint32_t green = baseAddress[baseOffset + 1] & 0xFF;
                uint32_t red = baseAddress[baseOffset + 2] & 0xFF;
                uint32_t rgbPixelOut = (red == green && green == blue) ? red : ((306*red + 601*green + 117*blue + 0x200) >> 10);
                if (rgbPixelOut > 255) rgbPixelOut = 255;
                self->_data[i + j*selfWidth] = rgbPixelOut;
            }
        });
    }
    return self;
}
@end

Additional Optimizations

1. Align memory access with the image’s row‑major storage to improve cache locality. 2. Use dispatch_apply for parallel pixel processing, reducing traversal time from 44 ms (column‑wise) to 12 ms.

Effect of Optimizations

Overall scanning latency was reduced by ~20× at the algorithm level, but real‑world metrics showed only a 0.9 s reduction in average successful scan time and a 1 % increase in success rate, because user‑side factors (camera focus, motion blur, barcode positioning) dominate the end‑to‑end experience.

Challenges to Success Rate

Issues such as poor binarization on dark backgrounds and excessive rotation still cause failures. Over‑rotation and low‑contrast regions can render the barcode unreadable.

Deep‑Learning‑Based Barcode Region Detection

To address region localization, two approaches were considered:

Traditional morphology (MSER, clustering) – high latency (130 ms @ 640×480, 600 ms @ 1920×1080) and ~89 % accuracy.

Deep learning – object detection to locate the barcode, followed by angle prediction to correct orientation before decoding.

The pipeline: capture image → object‑detection network → crop barcode region → angle‑prediction network → rotate to upright → decode.

Implementation used TensorFlow (shared between iOS and Android) with average inference times of 40 ms (iOS) and 60 ms. Although inference adds overhead, it eliminates unnecessary decoding attempts on frames without barcodes, saving overall CPU cycles.

Results After Deployment

Post‑release metrics: average scan time dropped from 4.1 s to 1.5 s, success rate rose from 91 % to 97 %, and merchant complaints virtually disappeared. Users now experience near‑instant recognition as soon as the camera opens.

Takeaways

Instrumented data collection is essential for identifying real bottlenecks.

Simply adopting third‑party libraries is insufficient; deep code analysis can uncover substantial gains.

Applying machine‑learning “dimensionality reduction” (e.g., object detection) can outperform classic algorithmic tweaks.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

iOS deep learning Mobile Optimization Barcode Scanning ZXing

Written by

Youzan Coder

Official Youzan tech channel, delivering technical insights and occasional daily updates from the Youzan tech team.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.