Frontend Development 18 min read

WebAssembly with Emscripten: High‑Performance MD5 Hashing and Archive Extraction in the Browser

This article demonstrates how to leverage WebAssembly and Emscripten to compile C code for high‑performance MD5 hashing and archive (zip/7z) parsing in the browser, covering library selection, memory management, file I/O via WorkerFS, async processing, and integration of C functions with JavaScript.

NetEase Game Operations Platform
NetEase Game Operations Platform
NetEase Game Operations Platform
WebAssembly with Emscripten: High‑Performance MD5 Hashing and Archive Extraction in the Browser

In a recent project the author needed to compute file MD5 hashes and parse compressed archive directories (including 7z) directly in the browser. JavaScript libraries such as spark‑md5 were too slow for gigabyte‑size files, so the solution switched to C libraries compiled to WebAssembly.

WebAssembly (Wasm) is a low‑level, binary format that runs in browsers with near‑native performance. By compiling C/C++/Rust code to Wasm, developers can reuse existing high‑performance libraries.

Hello World

A minimal C program is compiled with Emscripten:

#include
int main() {
  printf("hello world\n");
  return 0;
}

Using the Docker image trzeci/emscripten the compilation command is:

docker run --rm -v $(pwd):/working trzeci/emscripten \
  emcc /working/main.c -o /working/index.html

Emscripten produces three files: index.wasm (the binary), index.js (JS glue code) and index.html (entry page). The Wasm module can be loaded in a static server.

Loading WebAssembly

To load the compiled module as a reusable UMD module, compile with:

docker run --rm -v $(pwd):/working trzeci/emscripten \
  emcc /working/main.c -o /working/cutils.js \
  -s MODULARIZE=1 -s EXPORT_NAME=CUtils

Then include the generated script and instantiate:

<script src="path/to/cutils.js"></script>
<script>
  const Module = CUtils({
    onRuntimeInitialized: () => {
      // module ready
    }
  });
</script>

JavaScript calling C functions (MD5 example)

The C MD5 implementation provides MD5_Init , MD5_Update and MD5_Final . After compiling with the appropriate EXPORTED_FUNCTIONS and EXTRA_EXPORTED_RUNTIME_METHODS , the functions are invoked from JavaScript via Module.ccall or Module.cwrap :

const STRUCT_MD5_CTX_SIZE = 152;
const pMd5Ctx = Module.ccall('malloc', 'number', ['number'], [STRUCT_MD5_CTX_SIZE]);
// … call MD5_Init, MD5_Update, MD5_Final via ccall

The full MD5 calculation routine allocates memory for the context, the input buffer, and the result, copies the file data into Wasm memory, runs the C functions, extracts the 16‑byte digest, and frees the allocations.

Memory and ArrayBuffer

WebAssembly memory is an ArrayBuffer . Emscripten exposes typed‑array views such as Module.HEAP8 and Module.HEAPU8 . Files are read with FileReader (or FileReaderSync in a worker) and copied into Wasm memory via Module.HEAP8.set .

WorkerFS for large files

To avoid loading an entire file into memory, Emscripten’s WORKERFS provides a read‑only file system that streams File or Blob objects inside a Web Worker. The C side can then read the file in chunks:

void md5(char *path, unsigned char *md5_result) {
  const int CHUNK_SIZE = 16 * 1024 * 1024;
  char *buff = malloc(CHUNK_SIZE);
  FILE *stream = fopen(path, "r");
  size_t read_size;
  MD5_Init(&md5_ctx);
  while ((read_size = fread(buff, 1, CHUNK_SIZE, stream)) > 0) {
    MD5_Update(&md5_ctx, buff, read_size);
  }
  MD5_Final(md5_result, &md5_ctx);
  fclose(stream);
  free(buff);
}

In JavaScript the worker mounts the file:

Module.FS.mount(Module.FS.filesystems.WORKERFS, {files: [file]}, '/working');

Parallel optimisation

Reading the next chunk while the current chunk is being hashed reduces total time. An AsyncGenerator ( makeBlobIterator ) yields file slices ahead of the MD5 update loop, allowing overlapping I/O and computation.

Using third‑party libraries (libarchive)

To add a C library such as libarchive , compile it with Emscripten using emconfigure and emmake :

emconfigure ./configure
emmake make
emmake make install

The resulting Wasm module can be used to extract archives, with callbacks to JavaScript for each entry.

C calling JavaScript callbacks

Emscripten’s addFunction creates a function pointer that C code can invoke. Example:

// JavaScript
function handlePathname(path) { console.log(path); }
const handlePathnamePtr = Module.addFunction(handlePathname);
Module.ccall('extract', null, ['string', 'number'], [path, handlePathnamePtr]);

The C signature expects a function pointer void (*on_pathname)(const char*) .

64‑bit values and JavaScript limits

Standard C long is 32‑bit in Emscripten, preventing offsets >2 GB. libarchive uses 64‑bit callbacks ( int64_t ). Since JavaScript numbers lose precision beyond 53 bits, the solution passes pointers to 64‑bit values and reads/writes them directly from Wasm memory using helper functions setInt64 and getInt64 .

Conclusion

The article shows how to compile C code to WebAssembly with Emscripten, handle file I/O efficiently via WorkerFS, manage memory, integrate third‑party libraries, and bridge C‑to‑JavaScript and JavaScript‑to‑C calls, achieving up to 65 % faster MD5 computation and enabling complex tasks such as archive extraction directly in the browser.

WebAssemblycbrowserMD5EmscriptenFileIO
NetEase Game Operations Platform
Written by

NetEase Game Operations Platform

The NetEase Game Automated Operations Platform delivers stable services for thousands of NetEase titles, focusing on efficient ops workflows, intelligent monitoring, and virtualization.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.