Artificial Intelligence 27 min read

Claude Code Harness: Turning Data‑Warehouse AI Coding from Ad‑hoc Queries to Rule‑Driven Automation

The article analyzes the shortcomings of current AI‑assisted data‑warehouse development—context forgetting, unstable rule enforcement, and token‑heavy operations—and presents a five‑layer Harness architecture (persistent CLAUDE.md, Auto Memory, deterministic hooks, subagents, and SKILL refactoring) that systematically resolves these issues, boosts reliability, and embeds AI into the development pipeline.

DeWu Technology

May 20, 2026

Claude Code Harness: Turning Data‑Warehouse AI Coding from Ad‑hoc Queries to Rule‑Driven Automation

Background and Pain Points

In the offline data‑warehouse teams, Claude Code is the primary AI coding tool, delivering noticeable efficiency gains. However, three structural pain points emerge during real use:

Context forgetting: When the conversation reaches the token limit (≈95% of the window), Claude’s auto‑compact compresses history, discarding temporary constraints such as "amount field unit is thousand" and causing SQL errors with a 1,000‑fold data discrepancy.

Unstable rule enforcement: Manual or prompt‑based memory of coding standards (OneData naming, INSERT must contain PARTITION, etc.) drops to 60‑70% compliance under tight schedules.

Context explosion: Large‑scale tasks (blood‑line queries, 23‑item self‑tests, data comparisons) quickly fill the context, triggering compact and leading to further “memory loss”.

The core contradiction is that more complex requirements need more AI assistance, yet they also increase the risk of context overflow.

What is Harness?

Harness is the host runtime framework of Claude Code, providing a layered responsibility model that moves deterministic operations out of the LLM’s inference loop. It consists of:

Persistent CLAUDE.md: Stores iteration‑level constraints and specifications on disk; re‑injected after each session start or compact.

Auto Memory: Writes cross‑session discoveries (e.g., field‑unit reminders) to ~/.claude/projects/<project>/memory/MEMORY.md for automatic re‑loading.

Hooks: Configured in .claude/settings.json to run deterministic checks (SQL linting, dangerous DDL blocking) without relying on Claude’s memory.

Subagents: Independent agents (sql‑validator, dw‑explorer, data‑quality‑checker) that execute high‑token operations in isolated contexts.

SKILL refactor: Reduces main‑context consumption by moving the full SKILL file execution into a subagent and exposing only the final artefacts.

Five‑Layer Defense Architecture

Layer 1 – Write into CLAUDE.md (Immediate Use)

Project root contains .claude/CLAUDE.md. Each iteration writes current status, version, and iteration‑specific constraints (e.g., prohibited table modifications, partition field format). This file is the most reliable persistence point because it is re‑read from disk after every compact.

Layer 2 – Auto Memory (Running)

Claude automatically appends key observations to MEMORY.md. Example prompts that trigger memory writes: "the amount field unit is thousand", "field_a may be null in this scenario", "V1.0 introduces a change to field_b". These entries survive across sessions and are re‑injected after compact.

Layer 3 – Hooks for Automatic Validation (Core Defense)

Hooks are defined in .claude/settings.json and the .claude/hooks/ directory. Two critical hooks are:

{
  "hooks": {
    "PostToolUse": [{
      "matcher": "Write|Edit",
      "hooks": [{
        "type": "command",
        "command": "$CLAUDE_PROJECT_DIR/.claude/hooks/validate_sql.sh",
        "timeout": 60,
        "statusMessage": "Checking SQL standards..."
      }]
    }],
    "PreToolUse": [{
      "matcher": "Bash",
      "hooks": [{
        "type": "command",
        "command": "$CLAUDE_PROJECT_DIR/.claude/hooks/block_dangerous_ddl.sh"
      }]
    }]
  }
}

Sample validate_sql.sh (SQL linting):

#!/bin/bash
INPUT=$(cat)
FILE_PATH=$(echo "$INPUT" | python3 -c "import sys,json; d=json.load(sys.stdin); print(d.get('tool_input',{}).get('file_path',''))" 2>/dev/null)
[[ "$FILE_PATH" != *.sql ]] && exit 0
[[ -z "$FILE_PATH" ]] && exit 0
SQL=$(cat "$FILE_PATH" 2>/dev/null)
[[ -z "$SQL" ]] && exit 0
ERRORS=()
# Rule 1: Disallow SELECT *
echo "$SQL" | grep -iqE 'SELECT\s+\*' && ERRORS+=("CRITICAL: SELECT * found, explicit columns required")
# Rule 2: INSERT must contain PARTITION
if echo "$SQL" | grep -iqE 'INSERT\s+(INTO|OVERWRITE)'; then
  echo "$SQL" | grep -iqE 'PARTITION\s*\(' || ERRORS+=("CRITICAL: INSERT missing PARTITION clause")
fi
# Rule 3: Avoid DOUBLE for amount fields
echo "$SQL" | grep -iqE '\bDOUBLE\b' && ERRORS+=("WARNING: Use DECIMAL(20,4) instead of DOUBLE for amount")
# Rule 4: UPDATE/DELETE need WHERE
if echo "$SQL" | grep -iqE '\b(UPDATE|DELETE)\b'; then
  echo "$SQL" | grep -iqE '\bWHERE\b' || ERRORS+=("CRITICAL: UPDATE/DELETE without WHERE")
fi
if [ ${#ERRORS[@]} -gt 0 ]; then
  echo "=== SQL VALIDATION FAILED: $FILE_PATH ===" >&2
  for err in "${ERRORS[@]}"; do echo "  $err" >&2; done
  exit 2
fi
echo "SQL validation passed: $(basename $FILE_PATH)" >&2
exit 0

Sample block_dangerous_ddl.sh (prevent DROP/TRUNCATE on production tables):

#!/bin/bash
INPUT=$(cat)
CMD=$(echo "$INPUT" | python3 -c "import sys,json; d=json.load(sys.stdin); print(d.get('tool_input',{}).get('command',''))" 2>/dev/null)
if echo "$CMD" | grep -iqE '\b(DROP\s+TABLE|TRUNCATE\s+TABLE)\b'; then
  if ! echo "$CMD" | grep -qiE '(_dev|_test|_stg)\b'; then
    echo "BLOCKED: Production DROP/TRUNCATE detected" >&2
    exit 2
  fi
fi
exit 0

Layer 4 – Subagents for Context Isolation

Three agents are defined under .claude/agents/:

---
name: sql-validator
description: ODPS/MaxCompute SQL syntax and rule validator. Runs in an isolated context.
tools: Read, Bash, Grep, Glob
model: haiku
permissionMode: dontAsk
---
You are a data‑warehouse SQL expert. Validate only, do not modify files.

Validation items (priority):
1. DISALLOW SELECT *
2. INSERT must contain PARTITION
3. Use snake_case for fields
4. Use DECIMAL for amount, not DOUBLE
5. JOIN must have ON condition
6. Detect Cartesian product risk

Output format:
- status: PASS/FAIL
- problem list (CRITICAL/WARNING/INFO)
- line‑specific suggestions

(Max 50 lines)

---
name: dw-explorer
description: Data‑warehouse structure explorer. Reads DDL, field info, and one‑level lineage without modifying files.
tools: Read, Glob, Grep, Bash
model: haiku
permissionMode: dontAsk
---
When invoked, read the specified table’s DDL, partition strategy, and upstream/downstream lineage (one level). Return a concise summary containing table basics, core field definitions, and lineage list.

Layer 5 – SKILL File Refactor (Reduce Main‑Context Load)

SKILL files (≈10 KB) previously loaded entirely into the main context, accelerating compact. The refactor moves the full SKILL execution into a subagent and replaces the file with path‑scoped rule files, e.g.:

---
# .claude/rules/etl-rules.md
paths:
  - "**/*insert*.sql"
  - "**/*_di.sql"
  - "**/*_df.sql"
---
# ETL development rules (auto‑loaded per path)
- Must have partition_dt partition
- INSERT OVERWRITE must check partition existence
- Disallow cross‑database JOIN

The main conversation now only receives the final artefact paths (e.g., ddl_table_a.sql) and a PASS/FAIL summary.

Practical Deployment Steps

Project‑level context persistence: Create .claude/CLAUDE.md with current iteration status, version, and iteration‑specific constraints. Update “正在开发” and “本次迭代约束” sections at each sprint; clear constraints after release.

Configure hooks: Add .claude/settings.json and the hooks/ scripts shown above. PostToolUse runs validate_sql.sh after every .sql write; PreToolUse runs block_dangerous_ddl.sh before any DDL command.

Create subagents: Add sql-validator.md, dw-explorer.md, and data-quality-checker.md (similar structure). These handle self‑test result aggregation, lineage queries, and performance checks in isolated contexts.

SKILL invocation redesign: Replace direct SKILL calls with subagent commands, e.g. @"sql-validator (agent)" validate path/to/insert.sql, ensuring the main context only receives concise outcomes.

Comparison with Traditional Data‑Warehouse AI Development

Traditional workflow relies on Claude’s memory for both semantic understanding and rule enforcement, leading to frequent compact‑triggered forgetting and inconsistent compliance. Harness separates concerns: deterministic hooks guarantee rule adherence, subagents protect the main context from token‑heavy operations, and persistent files ensure cross‑session stability. The result is higher accuracy, lower rework, and a measurable reduction (≈50‑70%) in compact frequency.

Outcome and Benefits

Semantic‑plus‑rule accuracy improves dramatically; field‑unit errors drop from frequent to near‑zero.

Requirement‑understanding rework decreases from ~50% to ~10% due to explicit CLAUDE.md constraints and Stop‑hook verification.

SQL‑standard compliance rises from 70‑80% (memory‑based) to >95% (hook‑enforced).

Context overflow is mitigated, cutting compact triggers by half to two‑thirds.

Conclusion

Harness does not aim to make Claude smarter; it embeds Claude’s semantic strengths into a reliable pipeline while delegating deterministic tasks to hooks, subagents, and persistent files. This four‑layer (plus SKILL) division transforms data‑warehouse AI coding from an ad‑hoc conversational aid into a rule‑driven, production‑grade automation framework.

Key Diagrams

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

LLM AI coding Data Warehouse Hooks Claude Context Management subagents

Written by

DeWu Technology

A platform for sharing and discussing tech knowledge, guiding you toward the cloud of technology.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.

Background and Pain Points

What is Harness?

Five‑Layer Defense Architecture

Layer 1 – Write into CLAUDE.md (Immediate Use)

Layer 2 – Auto Memory (Running)

Layer 3 – Hooks for Automatic Validation (Core Defense)

Layer 4 – Subagents for Context Isolation

Layer 5 – SKILL File Refactor (Reduce Main‑Context Load)

Practical Deployment Steps

Comparison with Traditional Data‑Warehouse AI Development

Outcome and Benefits

Conclusion

Key Diagrams

DeWu Technology

How this landed with the community

Was this worth your time?

0 Comments

Layer 1 – Write into CLAUDE.md (Immediate Use)

Layer 2 – Auto Memory (Running)

Layer 3 – Hooks for Automatic Validation (Core Defense)

Layer 4 – Subagents for Context Isolation

Layer 5 – SKILL File Refactor (Reduce Main‑Context Load)