Fundamentals 8 min read

Fragment Deduplication Support in erofs-utils mkfs

This article explains the implementation of fragment deduplication in the erofs-utils mkfs tool, describing the hash‑based fragment lookup, the conditions for fixing fragments during compression, the generation of additional extents, and the integration of these steps into the existing packing workflow.

Coolpad Technology Team
Coolpad Technology Team
Coolpad Technology Team
Fragment Deduplication Support in erofs-utils mkfs

We recently added a fragment deduplication feature to erofs-utils mkfs, referenced in [PATCH v7] erofs-utils: mkfs: support fragment deduplication . The core idea is to deduplicate fragments before compression to avoid writing duplicate data into packed inodes, and if a pcluster is not full after deduplication, attempt to read fragments again for possible repair.

Finding fragments

A hash table is defined as an array of linked lists:

+static struct list_head dupli_frags[FRAGMENT_HASHSIZE];

Each node in the list represents a searchable fragment:

+struct erofs_fragment_dedupe_item {
+       struct list_head        list;
+       unsigned int            length, nr_dup;
+       erofs_off_t             pos;
+       u8                      data[];
+};

The fields length , pos , and data describe a fragment set by z_erofs_pack_fragments() , while nr_dup stores the number of duplicates found and links the list into the hash table.

Because fragments may be partially duplicated, a checksum is used as the hash value. The last EROFS_TOF_HASHLEN bytes of the fragment are hashed with erofs_crc32c() :

#define EROFS_TOF_HASHLEN              16
+#define FRAGMENT_HASHSIZE              65536
+#define FRAGMENT_HASH(c)               ((c) & (FRAGMENT_HASHSIZE - 1))

Matching hash values are compared with memcmp , and then a byte‑wise backward comparison determines the longest matching suffix:

+       list_for_each_entry(cur, head, list) {
+               unsigned int e1, mn, i = 0;
+               DBG_BUGON(cur->length <= EROFS_TOF_HASHLEN);
+               e1 = cur->length - EROFS_TOF_HASHLEN;
+               if (memcmp(cur->data + e1, data + e2, EROFS_TOF_HASHLEN))
+                       continue;
+               mn = min(e1, e2);
+               while (i < mn && cur->data[e1 - i - 1] == data[e2 - i - 1])
+                       ++i;
+               if (i && (!di || i + EROFS_TOF_HASHLEN > di->nr_dup)) {
+                       cur->nr_dup = i + EROFS_TOF_HASHLEN;
+                       di = cur;
+                       /* full match */
+                       if (i == mn)
+                               break;
+               }
+       }

When to fix fragments

If deduplication finds a fragment, the original packing logic in vle_compress_one() cannot see the packed data, so a fix is required when the pcluster is not completely filled:

} else if (may_packing && len == ctx->e.length &&
+                          ret < ctx->pclustersize &&
+                          (!inode->fragment_size || fix_dedupedfrag)) {

The fix involves reading the remaining fragment data into the queue and retrying compression:

+                       if (may_packing && len == ctx->e.length &&
+                           (ret & (EROFS_BLKSIZ - 1)) &&
+                           ctx->tail < sizeof(ctx->queue)) {
+                               ctx->pclustersize =
+                                       BLK_ROUND_UP(ret) * EROFS_BLKSIZ; // fill remaining part
+                               goto fix_dedupedfrag;
+                       }

At the fix label the remaining size is added to the queue, and the context is marked for a second pass:

+fix_dedupedfrag:
+       DBG_BUGON(!inode->fragment_size);
+       ctx->remaining += inode->fragment_size; // need to read fragment
+       ctx->e.length = 0;
+       ctx->fix_dedupedfrag = true;
+       return 0;

Generating fragment extent

After a successful deduplication, an extra extent must be generated for the fragment data that was not compressed:

+       /* generate an extent for the deduplicated fragment */
+       if (inode->fragment_size && !ctx.fragemitted) {
+               z_erofs_write_indexes(&ctx);
+               ctx.e.length = inode->fragment_size;
+               ctx.e.compressedblks = 0;
+               ctx.e.raw = true;
+               ctx.e.partial = false;
+               ctx.e.blkaddr = ctx.blkaddr;
+       }

The algorithm distinguishes three cases: (1) duplicate fragments that need no fix, (2) duplicate fragments that require fixing, and (3) fragments fully deduped during compression.

Committing fragments

The original packing function z_erofs_pack_fragments() set advisory flags ( z_advise ) and filesystem features ( sbi.feature ). These actions are now moved to a new function z_erofs_fragments_commit() to be called after deduplication and packing.

Test results

Partial test results for PATCH v5 show that the new deduplication logic behaves as expected, with only minor differences from the final version.

Todos

Future work includes improving compression efficiency for larger pclusters, adding fragment support to dump.erofs and fsck.erofs , and further performance tuning.

KernelcompressionfilesystemEROFSfragment deduplication
Coolpad Technology Team
Written by

Coolpad Technology Team

Committed to advancing technology and supporting innovators. The Coolpad Technology Team regularly shares forward‑looking insights, product updates, and tech news. Tech experts are welcome to join; everyone is invited to follow us.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.