Performance Tuning of JED Database on Huawei Kunpeng ARM vs Intel X86 Platforms
This technical report details the background, hardware configuration, database setup, tuning results, and step‑by‑step optimization procedures—including BIOS, OS, network, container NUMA binding, MySQL CRC32 patching, and Go PGO tuning—performed to improve JED performance on ARM compared with Intel.
Project Background
In response to national initiatives promoting independent technology, the project replaces foreign components with domestic ones, starting with databases. JED is deployed on a Huawei Kunpeng ARM server and compared with an Intel X86 server to evaluate performance after tuning.
Physical Machine Configuration
Processor Vendor
Architecture
CPU Model
CPU
Turbo
Memory Frequency
OS
Huawei
ARM
kunpeng920-7262C
128C
None
3200MT/s
Euler
Intel
X86
platium-8338C-3rd
128C
Enabled
3200MT/s
CentOS 8
Database Configuration
Deployment Site
Langfang
Deployment Method
Container
Gateway Config
16C/12G Disk:/export:30G
DB Architecture
1 cluster, primary‑secondary
DB Resources
8C/24G Disk:/export:512G
Optimization Results
Before tuning, under 50% background load, JED on Kunpeng achieved 58% of Intel's read performance and 68% of its write performance. After tuning, read performance reached 99% of Intel, write performance 121%, and mixed read/write (7:3) hit 113%, with TP99 and response times improved while CPU usage stayed at 100%.
Specific Tuning Steps
BIOS Optimization
Requires data‑center modification and host reboot.
Expected changes: disable CPU prefetching, set Power Policy to Performance, keep SMMU enabled.
Host OS Optimization
Disable firewall (already disabled in production):
systemctl status firewalld.service
systemctl stop firewalld.service
systemctl disable firewalld.service
systemctl status firewalld.serviceNetwork kernel parameters (no noticeable gain, left unchanged):
echo 1024 >/proc/sys/net/core/somaxconn
echo 16777216 >/proc/sys/net/core/rmem_max
echo 16777216 >/proc/sys/net/core/wmem_max
echo "4096 87380 16777216" >/proc/sys/net/ipv4/tcp_rmem
echo "4096 65536 16777216" >/proc/sys/net/ipv4/tcp_wmem
echo 360000 >/proc/sys/net/ipv4/tcp_max_syn_backlogIO Scheduler Optimization
echo deadline > /sys/block/nvme0n1/queue/scheduler
echo deadline > /sys/block/nvme1n1/queue/scheduler
echo deadline > /sys/block/nvme2n1/queue/scheduler
echo deadline > /sys/block/nvme3n1/queue/scheduler
echo deadline > /sys/block/sda/queue/scheduler
echo 2048 > /sys/block/nvme0n1/queue/nr_requests
echo 2048 > /sys/block/nvme1n1/queue/nr_requests
echo 2048 > /sys/block/nvme2n1/queue/nr_requests
echo 2048 > /sys/block/nvme3n1/queue/nr_requests
echo 2048 > /sys/block/sda/queue/nr_requestsCache Parameter Optimization
echo 5 >/proc/sys/vm/dirty_ratio
echo 1 >/proc/sys/vm/swappinessNetwork Card IRQ Binding
Adjust ethX queue count and bind IRQs to CPU cores (example for eth0):
ethtool -l eth0
ethtool -L eth0 combined 8
systemctl stop irqbalance
systemctl disable irqbalance
for i in $(cat /proc/interrupts | grep $(ethtool -i eth0 | grep -i bus-info | awk -F ': ' '{print $2}') | awk -F ':' '{print $1}'); do echo 31 > /proc/irq/$i/smp_affinity_list; doneBusiness Container NUMA Binding
Modify container cgroup to bind CPUs and memory to a specific NUMA node before deployment:
# Enter container cgroup directory
cd /sys/fs/cgroup/cpuset/kubepods/burstable/podXXXXXXXX/7b40a68aXXXXXXXX
# Stop Docker (restart resets cgroup)
systemctl stop docker
# Set CPU and memory sets
echo 16-23 > cpu.set
echo 0 > mem.setMySQL CRC32 Soft‑to‑Hard Compilation for ARM
cd /mysql-5.7.26
git apply crc32-mysql5.7.26.patchGo Version Upgrade and PGO Optimization
Upgrade Go to 1.21 and enable PGO profiling:
import _ "net/http/pprof"
# Run the program under load and collect profile
curl -o cpu.pprof http://localhost:8080/debug/pprof/profile?seconds=304
mv cpu.pprof default.pgo
go build -pgo=autoBenchmark results from the Go team show a 2‑7% performance gain for representative programs when built with PGO.
Conclusion
The comprehensive tuning—covering BIOS, OS, network, container, MySQL compilation, and Go application optimization—significantly narrows the performance gap between ARM‑based Kunpeng servers and Intel X86 servers for the JED database workload.
JD Retail Technology
Official platform of JD Retail Technology, delivering insightful R&D news and a deep look into the lives and work of technologists.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.