Top 10 Linux Ops Troubleshooting Tips Every Sysadmin Should Know
This article compiles ten common Linux operational problems—from shell script failures and cron output issues to disk space leaks and MySQL storage errors—detailing their causes and step‑by‑step solutions to help engineers quickly diagnose and resolve system faults.
As a Linux operations engineer, encountering various problems and failures is inevitable; summarizing experiences, investigating root causes, and documenting solutions is a good habit that turns practice into valuable knowledge.
The following list gathers ten typical issues you may meet during projects, along with their causes and fixes.
Common Linux Issues and Solutions
1. Shell script does not execute
Problem: A colleague reports a simple shell script fails with "bad interpreter: No such file or directory".
Cause: The script was edited on Windows, introducing CRLF line endings (\r\n) which appear as ^M in Linux.
Solution:
Rewrite the script directly on Linux.
Remove Windows line endings with
vi:%s/\r//gand
:%s/^M//g(type ^M with Ctrl+V, Ctrl+M).
Use
sh -x script.shfor step‑by‑step execution and debugging.
2. Crontab output fills /var/spool/clientmqueue
Problem: The
/var/spool/clientmqueuedirectory exceeds 100 GB.
Cause: Cron jobs produce output that is mailed to the cron user; because
sendmailis not running, the messages accumulate as files.
Solution:
Manually delete the files:
ls | xargs rm -fSuppress output in cron entries by appending
>/dev/null 2>&1to the command.
3. Telnet/SSH is slow
Problem: Telnet to a remote host is sluggish, while ping works and DNS lookup fails.
Cause: Reverse DNS lookup on the client’s IP is timing out.
Solution:
Add the correct
hostname‑
IPmapping to
/etc/hosts.
Comment out the non‑functional nameserver in
/etc/resolv.confor use a reliable one.
4. Read‑only file system error (MySQL)
Problem: MySQL fails to create a table, reporting "ERROR 1005 (HY000): Can't create table … (errno: 30)" which indicates a read‑only file system.
Possible causes:
File system corruption.
Bad disk sectors.
Incorrect
/etc/fstabentries (e.g., wrong file‑system type).
Solution: Reboot the test machine or remount the file system; in some cases
mount -o remount,rw /dev/…resolves the issue.
5. Deleted file does not free disk space
Problem:
df -hshows 90 GB used, but
du -sh *accounts for only 30 GB.
Cause: A process still holds an open file descriptor to a deleted file.
Solution:
Identify the offending process:
<code>/usr/sbin/lsof | grep deleted</code>Terminate the process or close the descriptor, e.g.,
echo > /proc/25575/fd/33.
Alternatively, truncate the file:
cat /dev/null > file.
6. Improve performance of find cleanup script
Problem: A nightly
findcommand that deletes old
picture_*files causes high load.
Cause: Scanning a directory with many entries is resource‑intensive.
Solution: Use a more efficient shell pipeline:
<code>#!/bin/sh
cd /tmp
time=$(date -d "2 days ago" "+%b%d")
ls -l | grep "picture" | grep "$time" | awk '{print $NF}' | xargs rm -rf
</code>7. Unable to obtain gateway MAC address
Problem: ARP fails to retrieve the MAC address of the gateway.
Solution:
Bind a static ARP entry:
arp -s 192.168.3.254 00:00:5e:00:01:648. HTTP service fails to start (port 7080)
Problem: Starting
httpdreports "Address already in use" for port 7080.
Cause:
Port appears occupied;
netstat -npl | grep 7080shows nothing.
The same port is defined in multiple configuration files.
Solution: Comment out the duplicate
Listen 7080line in
/etc/httpd/conf.d/t.10086.cn.confand restart the service.
9. "Too many open files" error
Problem: System reports "too many open files".
Solution: Increase file descriptor limits:
<code>echo "" >> /etc/security/limits.conf
echo "* soft nproc 65535" >> /etc/security/limits.conf
echo "* hard nproc 65535" >> /etc/security/limits.conf
echo "* soft nofile 65535" >> /etc/security/limits.conf
echo "* hard nofile 65535" >> /etc/security/limits.conf
echo "" >> /root/.bash_profile
echo "ulimit -n 65535" >> /root/.bash_profile
echo "ulimit -u 65535" >> /root/.bash_profile
</code>Then reboot or run
ulimit -u 65535 && ulimit -n 65535.
10. ibdata1 and mysql‑bin logs consume disk space
Problem: Disk usage alarm;
ibdata1>120 GB and
mysql‑bin>80 GB.
Cause:
ibdata1stores InnoDB tablespace and indexes in a shared file.
Binary logs accumulate over time.
Solution:
For oversized
ibdata1, dump databases, delete the file, and recreate the tablespace.
To prune binary logs:
<code>mysql> PURGE MASTER LOGS TO 'mysql-bin.010';
mysql> PURGE MASTER LOGS BEFORE '2010-12-22 13:00:00';
</code>Or set
expire_logs_days=30in
/etc/my.cnffor automatic cleanup.
Fault‑troubleshooting summary table
Efficient Ops
This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.