Big Data 24 min read

Ensuring Secure Write Paths in Hadoop S3A: Experiments, Benchmarks, and Best Practices

This article analyses the security of Hadoop S3A write paths in data lakes, explains fast upload mechanisms, demonstrates disk‑IO and network‑error simulations, compares checksum algorithms, and presents Alibaba Cloud EMR JindoSDK best‑practice results with performance and reliability evaluations.

Big Data Technology Architecture

Mar 15, 2023

Ensuring Secure Write Paths in Hadoop S3A: Experiments, Benchmarks, and Best Practices

Background

Data lakes increasingly rely on cloud object storage (e.g., S3) for its large capacity, low cost, and easy scalability. The S3 protocol has become the de‑facto standard, and many data platforms use S3A connectors that combine S3 semantics with Hadoop compatibility, such as Delta Lake on Databricks.

Hadoop S3 Write Support

Because S3 does not support incremental writes, the default S3A implementation buffers data locally and uploads it only when the file is closed, which can be inefficient for large files. Since Hadoop 2.8.5, setting fs.s3a.fast.upload=true enables fast upload: data is split into blocks (default 100 MiB) that are uploaded asynchronously as they are flushed, respecting S3 multipart constraints (minimum 5 MiB per part, maximum 10 000 parts).

Enabling fast upload causes S3AFileSystem to create a S3ABlockOutputStream (or S3AFastOutputStream in Hadoop 3.x). The stream delegates write/flush operations to an abstract S3ADataBlock, which can be an ArrayBlock (heap), DiskBlock (disk), or ByteBufferBlock (off‑heap). The choice is controlled by fs.s3a.fast.upload.buffer, defaulting to disk.

Disk Issues

Using disk as a buffer reduces memory pressure but introduces the Achilles’ heel of disk reliability: full disks, bad sectors, and occasional bit‑flips can jeopardise data integrity. Even with highly reliable disks, the probability of failure grows with the number of disks in a cluster.

For a replication factor R, disk annual failure rate P, and N disks, the number of possible R -replica combinations is C(N,R)=N!/(R!·(N‑R)!). The probability that R disks fail simultaneously can be derived from this combinatorial model.

Simulating Disk I/O Problems

a. Change fs.s3a.buffer.dir in core-site.xml to point to a real disk path (e.g., /data2/).

<property>
  <name>fs.s3a.fast.upload</name>
  <value>true</value>
</property>
<property>
  <!-- local buffer directory, will be created if missing -->
  <name>fs.s3a.buffer.dir</name>
  <value>/data2/tmp/</value>
</property>

b. Use a stap script to force I/O errors on writes to /dev/vdc:

#!/usr/bin/stap
probe vfs.write.return {
  if (devname == "vdc") {
    $return = -5
  }
}

c. Run a demo write program and observe the exception:

$ dd if=/dev/zero of=test-1G-stap bs=1G count=1
$ hadoop fs -put test-1G-stap s3a://<your-bucket>/
put: Input/output error

The Hadoop S3AFileSystem correctly propagates the I/O error as an IOException.

Simulating Disk Bit‑Flip

a. Modify libfuse passthrough write method and mount /data2/ to /mnt/passthrough:

$ mkdir -p /mnt/passthrough/
$ ./passthrough /mnt/passthrough/ -omodules=subdir -osubdir=/data2/ -oauto_unmount

b. Point fs.hadoop.tmp.dir to the mounted path in core-site.xml:

<property>
  <name>fs.s3a.fast.upload</name>
  <value>true</value>
</property>
<property>
  <!-- local buffer directory, will be created if missing -->
  <name>fs.s3a.buffer.dir</name>
  <value>/mnt/passthrough/</value>
</property>

c. Write a 1 GiB file through the mounted path and compare MD5 checksums:

$ mkdir -p input output
$ dd if=/dev/zero of=input/test-1G-fuse bs=1G count=1
$ hadoop fs -put input/test-1G-fuse s3a://<your-bucket>/
$ hadoop fs -get s3a://<your-bucket>/test-1G-fuse output/
$ md5sum input/test-1G-fuse output/test-1G-fuse

The checksums differ, showing that S3A cannot detect bit‑flips occurring after the local buffer is written.

Network Issues

Even with in‑memory writes, network problems such as bit‑flips or packet loss can corrupt data. The 2008 Amazon S3 incident demonstrated that multi‑router paths can cause undetectable bit‑flips, which bypass lower‑layer checksums.

S3 mitigates this by requiring a Content‑MD5 header for each upload part, ensuring end‑to‑end integrity.

Simulating Network Bit‑Flip

a. Install mitmproxy and write an addons.py script that corrupts the last byte of each PUT request:

from mitmproxy import ctx, http
import json, time, os
class HookOssRequest:
    def request(self, flow: http.HTTPFlow):
        if flow.request.host == "<your-bucket>.oss-cn-shanghai-internal.aliyuncs.com" and flow.request.method == "PUT":
            clen = len(flow.request.content)
            clist = list(flow.request.content)
            clist[clen-1] = ord('a')
            flow.request.content = bytes(clist)
            ctx.log.info(f"updated byte at {clen-1}")
    def response(self, flow: http.HTTPFlow):
        pass
addons = [HookOssRequest()]

b. Run the reverse proxy on localhost:8765 and point fs.s3a.endpoint to it (disable SSL):

$ mitmdump -s addons.py -p 8765 --set block_global=false --mode reverse:http://<your-bucket>.oss-cn-shanghai-internal.aliyuncs.com

<property>
  <name>fs.s3a.connection.ssl.enabled</name>
  <value>false</value>
</property>
<property>
  <name>fs.s3a.fast.upload</name>
  <value>true</value>
</property>

c. Upload a ~100 MiB file and observe the failure:

$ dd if=/dev/zero of=input/test-100M-proxy bs=$((100*1024*1024+1)) count=1
$ hadoop fs -put input/test-100M-proxy s3a://<your-bucket>/
WARN s3a.S3ABlockOutputStream: Transfer failure of block ...
com.amazonaws.AmazonClientException: Unable to verify integrity of data upload. Content‑MD5 mismatch.

S3 detects the corrupted part via the MD5 check.

Simulating Network Packet Loss

Modify addons.py to drop the second multipart request:

if "partNumber=2" in flow.request.path:
    flow.response = http.HTTPResponse.make(200, b"Hello World", {"Content-Type": "text/html"})
    ctx.log.info("drop part‑2 request!")

After uploading, S3 reports a missing part error during CompleteMultipartUpload, confirming that multipart integrity is verified.

Checksum Algorithm Selection

MD5, SHA‑1, SHA‑256, and SHA‑512 are common hash functions. MD5 and SHA‑1 are now considered insecure; SHA‑256/512 are safer but slower. CRC algorithms (CRC32, CRC64) are faster and provide strong error detection for communication data.

Benchmark results (100 MiB, 8 threads) show:

CRC32 ≈ 10 ms

CRC64 ≈ 86 ms

MD5 ≈ 175 ms

SHA‑256 ≈ 344 ms

Alibaba Cloud OSS supports MD5 and CRC64; CRC64 is preferred for its speed and reliability.

Best Practice with Alibaba Cloud EMR JindoSDK

JindoSDK’s JindoOutputStream offers two checksum modes:

Request‑level checksum (MD5) – disabled by default; enable via fs.oss.checksum.md5.enable=true.

Block‑level checksum (CRC64) – enabled by default; disable via fs.oss.checksum.crc64.enable=false.

Comparative results (jindosdk‑4.6.2 vs. S3AFileSystem):

Scenario

S3AFileSystem

JindoOssFileSystem

Disk I/O error

Throws java.io.IOException Throws java.io.IOException Disk bit‑flip

Not detected

Throws java.io.IOException Network bit‑flip

Throws AWSClientIOException Throws java.io.IOException Network packet loss

Throws AWSClientIOException Throws java.io.IOException Write 5 GiB file

13.375 s

6.849 s

JindoSDK provides more complete error detection and better performance.

Conclusion and Outlook

Secure data‑lake writes must consider memory, disk, and network unreliability, and select appropriate checksum algorithms. Understanding the full write path and testing for each failure mode ensures data integrity. Future work will extend these techniques to random‑read scenarios in OSS‑HDFS, which currently lack built‑in verification.

Appendix 1: S3A Configuration Example

<property>
  <name>fs.s3a.impl</name>
  <value>org.apache.hadoop.fs.s3a.S3AFileSystem</value>
</property>
<property>
  <name>fs.s3a.fast.upload</name>
  <value>true</value>
</property>
<property>
  <name>fs.s3a.buffer.dir</name>
  <value>/mnt/passthrough/</value>
</property>

Appendix 2: JindoSDK Configuration Example

<property>
  <name>fs.AbstractFileSystem.oss.impl</name>
  <value>com.aliyun.jindodata.oss.OSS</value>
</property>
<property>
  <name>fs.oss.impl</name>
  <value>com.aliyun.jindodata.oss.JindoOssFileSystem</value>
</property>
<property>
  <name>fs.oss.checksum.crc64.enable</name>
  <value>true</value>
</property>

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Hadoop checksum Network Reliability disk-io S3A

Written by

Big Data Technology Architecture

Exploring Open Source Big Data and AI Technologies

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.

Background

Hadoop S3 Write Support

Disk Issues

Simulating Disk I/O Problems

Simulating Disk Bit‑Flip

Network Issues

Simulating Network Bit‑Flip

Simulating Network Packet Loss

Checksum Algorithm Selection

Best Practice with Alibaba Cloud EMR JindoSDK

Conclusion and Outlook

Appendix 1: S3A Configuration Example

Appendix 2: JindoSDK Configuration Example

Big Data Technology Architecture

How this landed with the community

Was this worth your time?

0 Comments

Appendix 1: S3A Configuration Example

Appendix 2: JindoSDK Configuration Example