Cloud Native 13 min read

Improving OSS Small‑File Access Performance with StrmVol Storage Volumes in Kubernetes

StrmVol storage volumes replace the FUSE‑based OSS mount with a virtual block device and kernel‑mode file system, dramatically reducing latency for massive small‑file reads in Kubernetes workloads such as AI training datasets, and the article demonstrates setup, configuration, and performance testing using Argo Workflows.

Alibaba Cloud Infrastructure

Apr 28, 2025

Improving OSS Small‑File Access Performance with StrmVol Storage Volumes in Kubernetes

Object Storage Service (OSS) is widely used for massive unstructured data, but accessing millions of small files through the traditional FUSE‑based CSI driver incurs high latency due to frequent user‑kernel context switches and metadata overhead.

The StrmVol storage volume, supported by Alibaba Cloud Container Service (ACK), eliminates the FUSE middle‑layer by exposing a virtual block device backed by a kernel‑mode file system such as EROFS, thereby shortening the data path and accelerating read performance for read‑only, small‑file workloads like AI training sets and time‑series log analysis.

Core mechanisms and optimizations

Fast index construction : only file metadata (name, path, size) is synchronized, reducing initialization time.

Memory prefetch : concurrent pre‑fetch of data blocks based on the index lowers I/O wait.

Kernel‑mode file system : direct reads from memory avoid user‑space FUSE overhead; EROFS provides compression and efficient access.

Applicable scenarios

Read‑only workloads with massive small files (e.g., AI training image sets).

Data stored in OSS that does not require frequent updates.

Random‑read patterns where low latency is critical.

To use StrmVol, deploy the strmvol-csi-driver component from the ACK marketplace. After installation, define a PersistentVolume (PV) and PersistentVolumeClaim (PVC) similar to standard OSS volumes:

apiVersion: v1
kind: PersistentVolume
metadata:
  name: pv-strmvol
spec:
  capacity:
    # Up to 16 TiB can be stored under the OSS mount point.
    storage: 20Gi
  accessModes:
    - ReadOnlyMany
  persistentVolumeReclaimPolicy: Retain
  csi:
    driver: strmvol.csi.alibabacloud.com
    volumeHandle: pv-strmvol
    nodeStageSecretRef:
      name: strmvol-secret
      namespace: default
    volumeAttributes:
      bucket: imagenet
      path: /data
      url: oss-cn-hangzhou-internal.aliyuncs.com
      directMode: "false"
      resourceLimit: "4c8g"
---
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: pvc-strmvol
  namespace: default
spec:
  accessModes:
    - ReadOnlyMany
  resources:
    requests:
      storage: 20Gi
  volumeName: pv-strmvol

The directMode flag controls whether prefetch and local caching are disabled (useful for pure random‑read scenarios). resourceLimit defines the maximum CPU and memory the virtual block device may consume on the node (e.g., "4c8g" = 4 vCPU, 8 GiB RAM).

Performance testing is performed with an Argo Workflow that simulates distributed image‑set loading. The workflow consists of three stages: listing shard directories, parallel processing of each shard using GNU parallel, and aggregating timing results.

apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  generateName: distributed-imagenet-training-
spec:
  entrypoint: main
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
          - matchExpressions:
              - key: "node-type"
                operator: In
                values:
                  - "argo"
  volumes:
    - name: pvc-volume
      persistentVolumeClaim:
        claimName: pvc-strmvol
  templates:
    - name: main
      steps:
        - - name: list-shards
            template: list-imagenet-shards
        - - name: parallel-processing
            template: process-shard
            arguments:
              parameters:
                - name: paths
                  value: "{{item}}"
            withParam: "{{steps.list-shards.outputs.result}}"
        - - name: calculate-statistics
            template: calculate-averages
    - name: list-imagenet-shards
      script:
        image: mirrors-ssl.aliyuncs.com/python:latest
        command: [python]
        source: |
          import subprocess, json
          output = subprocess.check_output("ls /mnt/data", shell=True, text=True)
          files = [f for f in output.split('
') if f]
          print(json.dumps(files, indent=2))
        volumeMounts:
          - name: pvc-volume
            mountPath: /mnt/data
    - name: process-shard
      inputs:
        parameters:
          - name: paths
      container:
        image: alibaba-cloud-linux-3-registry.cn-hangzhou.cr.aliyuncs.com/alinux3/alinux3:latest
        command: [/bin/bash, -c]
        args:
          - |
            yum install -y parallel
            SHARD_JSON="/mnt/data/{{inputs.parameters.paths}}"
            START_TIME=$(date +%s)
            find "$SHARD_JSON" -maxdepth 1 -name "*.JPEG" -print0 | parallel -0 -j4 'cp {} /dev/null'
            END_TIME=$(date +%s)
            ELAPSED=$((END_TIME - START_TIME))
            mkdir -p /tmp/output
            echo $ELAPSED > /tmp/output/time_shard_{{inputs.parameters.paths}}.txt
        resources:
          requests:
            memory: "4Gi"
            cpu: "1000m"
          limits:
            memory: "4Gi"
            cpu: "2000m"
        volumeMounts:
          - name: pvc-volume
            mountPath: /mnt/data
      outputs:
        artifacts:
          - name: time_shard
            path: /tmp/output/time_shard_{{inputs.parameters.paths}}.txt
            oss:
              key: results/results-{{workflow.creationTimestamp}}/time_shard_{{inputs.parameters.paths}}.txt
            archive: {}
    - name: calculate-averages
      inputs:
        artifacts:
          - name: results
            path: /tmp/output
            oss:
              key: "results/results-{{workflow.creationTimestamp}}"
      container:
        image: registry-vpc.cn-beijing.aliyuncs.com/acs/busybox:1.33.1
        command: [sh, -c]
        args:
          - |
            echo "开始合并结果..."
            TOTAL_TIME=0
            SHARD_COUNT=0
            for time_file in /tmp/output/time_shard_*.txt; do
              TIME=$(cat $time_file)
              SHARD_ID=${time_file##*_}
              SHARD_ID=${SHARD_ID%.txt}
              echo "分片 $SHARD_ID: $TIME 秒"
              TOTAL_TIME=$((TOTAL_TIME + TIME))
              SHARD_COUNT=$((SHARD_COUNT + 1))
            done
            if [ $SHARD_COUNT -gt 0 ]; then
              AVERAGE=$((TOTAL_TIME / SHARD_COUNT))
              echo "--------------------------------"
              echo "总分片数量: $SHARD_COUNT"
              echo "总处理时间: $TOTAL_TIME 秒"
              echo "平均处理时间: $AVERAGE 秒/分片"
              echo "Average: $AVERAGE seconds" > /tmp/output/time_stats.txt
            else
              echo "错误：未找到分片时间数据"
              exit 1
            fi
      outputs:
        artifacts:
          - name: test-file
            path: /tmp/output/time_stats.txt
            oss:
              key: results/results-{{workflow.creationTimestamp}}/time_stats.txt
            archive: {}

The workflow completed in about 21 seconds per shard, yielding an average of 21 seconds for the four ImageNet sub‑directories.

Alibaba Cloud also provides an open‑source implementation based on the containerd/overlaybd project, which can be combined with OCI image volumes for read‑only data mounts; see the KubeCon Europe 2025 talk for details.

In summary, StrmVol offers a lightweight, kernel‑direct storage solution that dramatically improves read latency for massive small‑file, read‑only workloads on OSS, with simple CSI deployment, configurable resource limits, and proven performance gains demonstrated via Argo Workflows.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

performance Kubernetes CSI Object Storage Argo Workflows StrmVol

Written by

Alibaba Cloud Infrastructure

For uninterrupted computing services

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.