Cloud Computing 13 min read

Performance Comparison of Alibaba Cloud OSSFS 1.0 and 2.0 for AI and High‑Throughput Workloads

This article analyzes the design differences between OSSFS 1.0 and OSSFS 2.0, presents detailed performance benchmarks for sequential and concurrent file operations, and demonstrates how the optimized OSSFS 2.0 client improves AI model loading and Kubernetes storage integration.

Alibaba Cloud Infrastructure
Alibaba Cloud Infrastructure
Alibaba Cloud Infrastructure
Performance Comparison of Alibaba Cloud OSSFS 1.0 and 2.0 for AI and High‑Throughput Workloads

Alibaba Cloud Object Storage Service (OSS) provides massive, secure, low‑cost, high‑reliability cloud storage with up to 100 Gbps download bandwidth per account in multiple regions. To enable Kubernetes workloads to read and write OSS data as if it were a local file system, a FUSE‑based client translates POSIX operations into RESTful OSS requests.

OSSFS 1.0, derived from the open‑source S3FS‑FUSE project, offered full POSIX semantics such as UID metadata, soft‑link handling, extended attributes, and a local disk cache for complete write capabilities. However, AI workloads demand large‑file sequential reads/writes, massive small‑file concurrency, and minimal metadata overhead, which the 1.0 design could not meet efficiently.

OSSFS 2.0 was introduced to address these AI‑centric requirements. It drops full POSIX compatibility, keeping only essential attributes (mtime, size), reduces HeadObj metadata calls, rebuilds on the FUSE3 low‑level API, introduces a more flexible metadata cache, leverages lower‑level kernel APIs to cut thread switches and data copies, and incorporates Alibaba Cloud’s coroutine technology for higher concurrency and lower CPU usage.

Performance testing was conducted with the FIO tool on a single node, comparing OSSFS 1.0 and 2.0 for sequential write of a 100 GB file, sequential read (single‑thread and 4‑thread), and 128‑thread concurrent reads of 100 000 × 128 KB small files. OSSFS 2.0 achieved roughly 18× higher write bandwidth, 8.5× higher single‑thread read bandwidth, 5× higher 4‑thread read bandwidth, and over 280× higher small‑file concurrent read bandwidth.

In an AI inference scenario, loading a safetensors model (Qwen‑2.5‑72B, ~134.5 GB) requires patterned random reads with offset jumps. OSSFS 2.0 includes optimizations for such access patterns. Tests on an ecs.g7.32xlarge node (128 vCPU, 512 GiB) using the Hugging Face vllm library showed significant speedups compared to OSSFS 1.0, which used a direct‑read mode with the following mount options:

-odirect_read -odirect_read_prefetch_chunks=256 -odirect_read_prefetch_limit=8192 -odirect_read_backward_chunks=256 -odirect_read_chunk_size=16

OSSFS 2.0 uses its default parameters, eliminating the need for these complex flags.

For deployment in Alibaba Cloud Container Service for Kubernetes (ACK), the official CSI driver now supports OSSFS 2.0. The following YAML defines a PersistentVolume (PV) and PersistentVolumeClaim (PVC) that mount an OSS bucket using the OSSFS 2.0 client:

apiVersion: v1
kind: PersistentVolume
metadata:
  name: pv-ossfs2
spec:
  capacity:
    storage: 20Gi
  accessModes:
    - ReadOnlyMany
  persistentVolumeReclaimPolicy: Retain
  csi:
    driver: ossplugin.csi.alibabacloud.com
    volumeHandle: pv-ossfs2  # must match PV name
    nodePublishSecretRef:
      name: oss-secret
      namespace: default
    volumeAttributes:
      fuseType: ossfs2  # indicates OSSFS 2.0
      bucket: cnfs-oss-test
      path: /subpath
      url: oss-cn-hangzhou-internal.aliyuncs.com
      otherOpts: "-o close_to_open=false"
---
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: pvc-ossfs2
  namespace: default
spec:
  accessModes:
    - ReadOnlyMany
  resources:
    requests:
      storage: 20Gi
  volumeName: pv-ossfs2

After applying these manifests, applications can access OSS data via the PVC, with support for both static and dynamic provisioning.

Conclusion : OSSFS 2.0 delivers substantial performance gains for machine‑learning, autonomous‑driving, and genomics workloads, achieving up to dozens of times higher throughput compared with OSSFS 1.0. ACK clusters have already validated these improvements in real‑world stress tests, confirming the client’s stability under high‑concurrency, high‑bandwidth conditions.

Appendix: a selection table recommends OSSFS 1.0 for general read/write with permission needs, OSSFS 2.0 for read‑only or sequential‑append scenarios (e.g., inference, training data loading), and StrmVol for massive small‑file read‑only workloads.

performanceAIKubernetescloud storageOSSCSIFUSE
Alibaba Cloud Infrastructure
Written by

Alibaba Cloud Infrastructure

For uninterrupted computing services

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.