OneFlow Coop: Joint Optimization of Dynamic‑Graph Recomputation and Memory Allocation
This article introduces OneFlow Coop, a memory‑optimization technique that jointly optimizes dynamic‑graph recomputation strategies and GPU memory allocation by analyzing existing DTR limitations, proposing recomputable in‑place, op‑guided tensor allocation, and layout‑aware eviction modules, and demonstrating superior experimental results.
The article presents OneFlow Coop, a novel approach that combines dynamic‑graph recomputation (DTR) with memory‑allocation strategies to reduce GPU memory consumption during neural‑network training.
It first explains the background of DTR, noting that most GPU memory is occupied by intermediate feature tensors rather than model parameters, and describes how recomputation can free memory by re‑evaluating released tensors during the backward pass.
The limitations of existing DTR methods are discussed: greedy selection can cause fragmented memory, ignore tensor ordering, and conflict with in‑place operations, leading to inefficient memory usage and higher recomputation costs.
Coop addresses these issues through three modules: (1) recomputable in‑place , which enables tensors to share memory while remaining recomputable; (2) op‑guided tensor allocation , which places low‑cost tensors and high‑cost tensors on opposite sides of the memory pool based on operation type; and (3) layout‑aware eviction , which uses a sliding‑window algorithm to find the optimal contiguous free‑memory block with minimal release cost, reducing the search complexity from O(n²) to O(n).
Experimental results show that Coop consistently achieves lower memory fragmentation and faster search times compared to traditional DTR and the DTE variant across multiple models, while maintaining comparable computational overhead.
The article concludes with a brief Q&A, confirming OneFlow’s support for both dynamic and static graphs, its compatibility with PyTorch APIs, and the applicability of Coop to large‑model training and scenarios with limited GPU memory.
DataFunSummit
Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.