Detecting, Cleaning, and Preventing Sensitive Data in Git Repositories
This article explains how to identify, remove, and avoid committing sensitive information such as passwords, keys, or tokens in Git repositories by using git log searches, tools like Gitleaks and Detect‑Secrets, and scripts for history rewriting, while also describing preventive pre‑commit hook setups.
SmartIDE is an open‑source cloud‑native IDE that lets developers start a development environment with a single smartide start command, supporting multiple languages and IDEs on Windows, macOS, and Linux.
If sensitive information accidentally enters a Git repository, it can cause serious security issues. The article introduces methods to detect, clean, and prevent such leaks.
Detecting Sensitive Information
1. Use git log to search for specific secrets:
## 直接搜索git history for string
git log -S <敏感词/密码内容等> --oneline
## 直接搜索git history for file
git log --all --full-history -- "**/thefile.*"2. Use Gitleaks, an open‑source scanner that matches predefined patterns (passwords, keys, tokens) without prior knowledge of the secret content. Example JSON output from Gitleaks shows a detected AWS secret.
{
"Description": "AWS",
"StartLine": 37,
"EndLine": 37,
"StartColumn": 19,
"EndColumn": 38,
"Match": "\t\t\"aws_secret= \"AKIAIMNOJVGFDXXXE4OA\"\": true,",
"Secret": "AKIAIMNOJVGFDXXXE4OA",
"File": "checks_test.go",
"Commit": "ec2fc9d6cb0954fb3b57201cf6133c48d8ca0d29",
"Entropy": 0,
"Author": "zricethezav",
"Email": "[email protected]",
"Date": "2018-01-28 17:39:00 -0500 -0500",
"Message": "[update] entropy check",
"Tags": [],
"RuleID": "aws-access-token"
}3. Use Detect‑Secrets, which offers a plugin mechanism for custom patterns. Both tools can be integrated as pre‑commit hooks.
Cleaning Sensitive Information
Because Git history is immutable, removing a secret requires rewriting history. The recommended approach is to back up the affected files, purge their history, and then restore the files. The following Bash script generates a list of deleted files and rewrites history to remove them:
## 以下脚本只能在Linux上运行
## 生成已删除文件清单 deleted.txt
git log --diff-filter=D --summary <起始点的CommitID>..HEAD | egrep -o '*[[:alnum:]]*(/[[:alnum:].]*)+$' > deleted.txt
## 清理历史记录
for del in `cat deleted.txt`
do
git filter-branch --index-filter "git rm --cached --ignore-unmatch $del" --prune-empty -- --all
# The following seems to be necessary every time
# because otherwise git won't overwrite refs/original
git reset --hard
git for-each-ref --format="%(refname)" refs/original/ | xargs -n 1 git update-ref -d
git reflog expire --expire=now --all
git gc --aggressive --prune=now
done;Preventing Sensitive Data Commits
Pre‑commit hooks using Gitleaks or Detect‑Secrets can reject commits that contain secrets, providing a proactive defense. References for configuring these hooks are provided.
Additional promotional content mentions the IDCF DevOps Hackathon, encouraging participation.
DevOps
Share premium content and events on trends, applications, and practices in development efficiency, AI and related technologies. The IDCF International DevOps Coach Federation trains end‑to‑end development‑efficiency talent, linking high‑performance organizations and individuals to achieve excellence.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.