PHP Implementation of Simple Multi‑way Merge Sort for Large Files
This article explains how to implement a simple multi‑way merge sort in PHP to sort a massive 1 GB file using only 100 MB of memory, detailing the file‑splitting, incremental merging steps, and providing a complete PHP code demo.
Sorting extremely large files with limited memory is a common interview and engineering challenge. This guide shows how to sort a 1 GB file containing unordered numbers while only using about 100 MB of RAM by applying a multi‑way merge sort algorithm written in PHP.
The process consists of three main stages: (1) read the large file line by line, group every 10,000 lines, sort each group, and write the sorted chunks to temporary files (e.g., t1.txt , t2.txt , …); (2) open all temporary files, read the first line from each, place those values into a temporary array, repeatedly extract the smallest value, write it to the final result file, and replace the extracted value with the next line from the same file; (3) continue this merge loop until all temporary files are exhausted, producing a fully sorted output.
The following PHP code demonstrates a minimal skeleton of this multi‑way merge approach. In a production setting the "min" selection could be optimized with a fixed‑size min‑heap or a priority queue that also tracks the originating file.
function multiWaySort()
{
// Read all file descriptors
$fds = [];
$dir = scandir('./runtime/txt/');
foreach ($dir as $file) {
if ($file != '.' && $file != '..') {
$fds[] = fopen('./runtime/txt/' . $file, 'rb');
}
}
// Read the first line of each file into a temporary sorting array
$tmpNums = [];
foreach ($fds as $fd) {
$tmpNums[] = (int)fgets($fd);
}
$nums = [];
$count = 0;
while (true) {
if (empty($tmpNums)) break;
// Put the smallest value into the temporary storage array
$value = min($tmpNums);
$nums[] = $value;
// Read the next line from the file that provided the smallest value
$idx = array_search($value, $tmpNums);
$next = fgets($fds[$idx]);
if (!$next) {
unset($tmpNums[$idx]);
unset($fds[$idx]);
} else {
$tmpNums[$idx] = (int)$next;
}
// When the temporary storage array reaches a certain size, flush it to the result file
if (count($nums) == 20) {
foreach ($nums as $value) {
$f = fopen('./runtime/result.txt', 'ab+');
fwrite($f, $value . PHP_EOL);
}
$nums = [];
}
if ($count == 4999999) {
continue;
}
$count++;
}
}By adapting the min‑value extraction to a more efficient data structure such as a binary heap or SPLPriorityQueue, the algorithm can handle even larger datasets while keeping memory usage low.
php中文网 Courses
php中文网's platform for the latest courses and technical articles, helping PHP learners advance quickly.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.