Information Security 7 min read

Detecting, Cleaning, and Preventing Sensitive Data in Git Repositories

This article explains how to identify, remove, and avoid committing sensitive information such as passwords, keys, or tokens in Git repositories by using git log searches, tools like Gitleaks and Detect‑Secrets, and scripts for history rewriting, while also describing preventive pre‑commit hook setups.

DevOps
DevOps
DevOps
Detecting, Cleaning, and Preventing Sensitive Data in Git Repositories

SmartIDE is an open‑source cloud‑native IDE that lets developers start a development environment with a single smartide start command, supporting multiple languages and IDEs on Windows, macOS, and Linux.

If sensitive information accidentally enters a Git repository, it can cause serious security issues. The article introduces methods to detect, clean, and prevent such leaks.

Detecting Sensitive Information

1. Use git log to search for specific secrets:

## 直接搜索git history for string
git log -S <敏感词/密码内容等> --oneline
## 直接搜索git history for file
git log --all --full-history -- "**/thefile.*"

2. Use Gitleaks, an open‑source scanner that matches predefined patterns (passwords, keys, tokens) without prior knowledge of the secret content. Example JSON output from Gitleaks shows a detected AWS secret.

{
    "Description": "AWS",
    "StartLine": 37,
    "EndLine": 37,
    "StartColumn": 19,
    "EndColumn": 38,
    "Match": "\t\t\"aws_secret= \"AKIAIMNOJVGFDXXXE4OA\"\":          true,",
    "Secret": "AKIAIMNOJVGFDXXXE4OA",
    "File": "checks_test.go",
    "Commit": "ec2fc9d6cb0954fb3b57201cf6133c48d8ca0d29",
    "Entropy": 0,
    "Author": "zricethezav",
    "Email": "[email protected]",
    "Date": "2018-01-28 17:39:00 -0500 -0500",
    "Message": "[update] entropy check",
    "Tags": [],
    "RuleID": "aws-access-token"
}

3. Use Detect‑Secrets, which offers a plugin mechanism for custom patterns. Both tools can be integrated as pre‑commit hooks.

Cleaning Sensitive Information

Because Git history is immutable, removing a secret requires rewriting history. The recommended approach is to back up the affected files, purge their history, and then restore the files. The following Bash script generates a list of deleted files and rewrites history to remove them:

## 以下脚本只能在Linux上运行
## 生成已删除文件清单 deleted.txt
git log --diff-filter=D --summary <起始点的CommitID>..HEAD | egrep -o '*[[:alnum:]]*(/[[:alnum:].]*)+$' > deleted.txt

## 清理历史记录
for del in `cat deleted.txt`
do
    git filter-branch --index-filter "git rm --cached --ignore-unmatch $del" --prune-empty -- --all
    # The following seems to be necessary every time
    # because otherwise git won't overwrite refs/original
    git reset --hard
    git for-each-ref --format="%(refname)" refs/original/ | xargs -n 1 git update-ref -d
    git reflog expire --expire=now --all
    git gc --aggressive --prune=now
done;

Preventing Sensitive Data Commits

Pre‑commit hooks using Gitleaks or Detect‑Secrets can reject commits that contain secrets, providing a proactive defense. References for configuring these hooks are provided.

Additional promotional content mentions the IDCF DevOps Hackathon, encouraging participation.

gitsecuritySensitive Datapre-commitdetect-secretsGitleakshistory-rewrite
DevOps
Written by

DevOps

Share premium content and events on trends, applications, and practices in development efficiency, AI and related technologies. The IDCF International DevOps Coach Federation trains end‑to‑end development‑efficiency talent, linking high‑performance organizations and individuals to achieve excellence.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.