Improve Rust target cleanup skill

This commit is contained in:
2026-03-12 13:25:53 -07:00
committed by Kat Huang
parent df2f78d374
commit 01b79dc771
3 changed files with 315 additions and 14 deletions

View File

@@ -7,6 +7,12 @@ description: Investigate and safely reclaim disk space on this machine, especial
Reclaim disk space with a safety-first workflow: investigate first, run obvious low-risk cleanup wins, then do targeted analysis for larger opportunities. Reclaim disk space with a safety-first workflow: investigate first, run obvious low-risk cleanup wins, then do targeted analysis for larger opportunities.
Bundled helpers:
- `scripts/rust_target_dirs.py`: inventory and guarded deletion for explicit Rust `target/` directories
- `references/rust-target-roots.txt`: machine-specific roots for Rust artifact scans
- `references/ignore-paths.md`: machine-specific excludes for `du`/`ncdu`
## Execution Default ## Execution Default
- Start with non-destructive investigation and quick sizing. - Start with non-destructive investigation and quick sizing.
@@ -19,11 +25,11 @@ Reclaim disk space with a safety-first workflow: investigate first, run obvious
1. Establish current pressure and biggest filesystems 1. Establish current pressure and biggest filesystems
2. Run easy cleanup wins 2. Run easy cleanup wins
3. Sweep Rust build artifacts in common project roots 3. Inventory Rust build artifacts and clean the right kind of target
4. Investigate remaining heavy directories with `ncdu`/`du` 4. Investigate remaining heavy directories with `ncdu`/`du`
5. Investigate `/nix/store` roots when large toolchains still persist 5. Investigate `/nix/store` roots when large toolchains still persist
6. Summarize reclaimed space and next candidate actions 6. Summarize reclaimed space and next candidate actions
7. Record new machine-specific ignore paths or cleanup patterns in this skill 7. Record new machine-specific ignore paths, Rust roots, or cleanup patterns in this skill
## Step 1: Baseline ## Step 1: Baseline
@@ -66,31 +72,46 @@ npm cache clean --force
## Step 3: Rust Build Artifact Cleanup ## Step 3: Rust Build Artifact Cleanup
Target common roots first: `~/Projects` and `~/code`. Do not start with a blind `find ~ -name target` or with hard-coded roots that may miss worktrees. Inventory explicit `target/` directories first using the bundled helper and the machine-specific root list in `references/rust-target-roots.txt`.
Use `cargo-sweep` in dry-run mode before deleting: Inventory the biggest candidates:
```bash ```bash
nix run nixpkgs#cargo-sweep -- sweep -d -r -t 30 ~/Projects ~/code python /home/imalison/dotfiles/dotfiles/agents/skills/disk-space-cleanup/scripts/rust_target_dirs.py list --min-size 500M --limit 30
``` ```
Then perform deletion: Focus on stale targets only:
```bash ```bash
nix run nixpkgs#cargo-sweep -- sweep -r -t 30 ~/Projects ~/code python /home/imalison/dotfiles/dotfiles/agents/skills/disk-space-cleanup/scripts/rust_target_dirs.py list --min-size 1G --older-than 14 --output tsv
``` ```
Alternative for toolchain churn cleanup: Use `cargo-sweep` when the repo is still active and you want age/toolchain-aware cleanup inside a workspace:
```bash ```bash
nix run nixpkgs#cargo-sweep -- sweep -r -i ~/Projects ~/code nix run nixpkgs#cargo-sweep -- sweep -d -r -t 30 <workspace-root>
nix run nixpkgs#cargo-sweep -- sweep -r -t 30 <workspace-root>
nix run nixpkgs#cargo-sweep -- sweep -d -r -i <workspace-root>
nix run nixpkgs#cargo-sweep -- sweep -r -i <workspace-root>
```
Use direct `target/` deletion when inventory shows a discrete stale directory, especially for inactive repos or project-local worktrees. The helper only deletes explicit paths named `target` that are beneath configured roots and a Cargo project:
```bash
python /home/imalison/dotfiles/dotfiles/agents/skills/disk-space-cleanup/scripts/rust_target_dirs.py delete /abs/path/to/target
python /home/imalison/dotfiles/dotfiles/agents/skills/disk-space-cleanup/scripts/rust_target_dirs.py delete /abs/path/to/target --yes
``` ```
Recommended sequence: Recommended sequence:
1. Run `-t 30` first for age-based stale builds. 1. Run `rust_target_dirs.py list` to see the largest `target/` directories across `~/Projects`, `~/org`, `~/dotfiles`, and other configured roots.
2. Run a dry-run with `-i` next. 2. For active repos, prefer `cargo-sweep` from the workspace root.
3. Apply `-i` when dry-run shows significant reclaimable space. 3. For inactive repos, abandoned branches, and `.worktrees/*/target`, prefer guarded direct deletion of the explicit `target/` directory.
4. Re-run the list command after each deletion round to show reclaimed space.
Machine-specific note:
- Project-local `.worktrees/*/target` directories are common cleanup wins on this machine and are easy to miss with the old hard-coded workflow.
## Step 4: Investigation with `ncdu` and `du` ## Step 4: Investigation with `ncdu` and `du`
@@ -159,6 +180,7 @@ nix why-depends <consumer-store-path> <dependency-store-path>
Common retention pattern on this machine: Common retention pattern on this machine:
- Many `.direnv/flake-profile-*` symlinks under `~/Projects` and worktrees keep `nix-shell-env`/`ghc-shell-*` roots alive. - Many `.direnv/flake-profile-*` symlinks under `~/Projects` and worktrees keep `nix-shell-env`/`ghc-shell-*` roots alive.
- Old taffybar constellation repos under `~/Projects` can pin large Haskell closures through `.direnv` and `result` symlinks. Deleting `gtk-sni-tray`, `status-notifier-item`, `dbus-menu`, `dbus-hslogger`, and `gtk-strut` and then rerunning `nix-collect-garbage -d` reclaimed about 11G of store data in one validated run.
- `find_store_path_gc_roots` is especially useful for proving GHC retention: many large `ghc-9.10.3-with-packages` paths are unique per project, while the base `ghc-9.10.3` and docs paths are shared. - `find_store_path_gc_roots` is especially useful for proving GHC retention: many large `ghc-9.10.3-with-packages` paths are unique per project, while the base `ghc-9.10.3` and docs paths are shared.
- Quantify before acting: - Quantify before acting:
@@ -177,6 +199,7 @@ nix-store --gc --print-roots | rg '/\\.direnv/flake-profile-' | awk -F' -> ' '{p
- Do not delete user files directly unless explicitly requested. - Do not delete user files directly unless explicitly requested.
- Prefer cleanup tools that understand ownership/metadata (`nix`, `docker`, `podman`, `cargo-sweep`) over `rm -rf`. - Prefer cleanup tools that understand ownership/metadata (`nix`, `docker`, `podman`, `cargo-sweep`) over `rm -rf`.
- For Rust build artifacts, deleting an explicit directory literally named `target` is acceptable when it is discovered by the bundled helper; Cargo will rebuild it.
- Present a concise “proposed actions” list before high-impact deletes. - Present a concise “proposed actions” list before high-impact deletes.
- If uncertain whether data is needed, stop at investigation and ask. - If uncertain whether data is needed, stop at investigation and ask.
@@ -187,5 +210,6 @@ Treat this skill as a living playbook.
After each disk cleanup task: After each disk cleanup task:
1. Add newly discovered mountpoints or directories to ignore in `references/ignore-paths.md`. 1. Add newly discovered mountpoints or directories to ignore in `references/ignore-paths.md`.
2. Add validated command patterns or caveats discovered during the run to this `SKILL.md`. 2. Add newly discovered Rust repo roots in `references/rust-target-roots.txt`.
3. Keep instructions practical and machine-specific; remove stale guidance. 3. Add validated command patterns or caveats discovered during the run to this `SKILL.md`.
4. Keep instructions practical and machine-specific; remove stale guidance.

View File

@@ -0,0 +1,6 @@
# One absolute path per line. Comments are allowed.
# Keep this list machine-specific and update it when Rust repos move.
/home/imalison/Projects
/home/imalison/org
/home/imalison/dotfiles

View File

@@ -0,0 +1,271 @@
#!/usr/bin/env python3
import argparse
import json
import os
import shutil
import subprocess
import sys
import time
from pathlib import Path
SCRIPT_DIR = Path(__file__).resolve().parent
DEFAULT_ROOTS_FILE = SCRIPT_DIR.parent / "references" / "rust-target-roots.txt"
def parse_size(value: str) -> int:
text = value.strip().upper()
units = {
"B": 1,
"K": 1024,
"KB": 1024,
"M": 1024**2,
"MB": 1024**2,
"G": 1024**3,
"GB": 1024**3,
"T": 1024**4,
"TB": 1024**4,
}
for suffix, multiplier in units.items():
if text.endswith(suffix):
number = text[: -len(suffix)].strip()
return int(float(number) * multiplier)
return int(float(text))
def human_size(num_bytes: int) -> str:
value = float(num_bytes)
for unit in ["B", "K", "M", "G", "T"]:
if value < 1024 or unit == "T":
if unit == "B":
return f"{int(value)}B"
return f"{value:.1f}{unit}"
value /= 1024
return f"{num_bytes}B"
def is_relative_to(path: Path, root: Path) -> bool:
try:
path.relative_to(root)
return True
except ValueError:
return False
def load_roots(roots_file: Path, cli_roots: list[str]) -> list[Path]:
roots: list[Path] = []
for raw in cli_roots:
candidate = Path(raw).expanduser().resolve()
if candidate.exists():
roots.append(candidate)
if roots_file.exists():
for line in roots_file.read_text().splitlines():
stripped = line.split("#", 1)[0].strip()
if not stripped:
continue
candidate = Path(stripped).expanduser().resolve()
if candidate.exists():
roots.append(candidate)
unique_roots: list[Path] = []
seen: set[Path] = set()
for root in roots:
if root not in seen:
unique_roots.append(root)
seen.add(root)
return unique_roots
def du_size_bytes(path: Path) -> int:
result = subprocess.run(
["du", "-sb", str(path)],
check=True,
capture_output=True,
text=True,
)
return int(result.stdout.split()[0])
def nearest_cargo_root(path: Path, stop_roots: list[Path]) -> str:
current = path.parent
stop_root_set = set(stop_roots)
while current != current.parent:
if (current / "Cargo.toml").exists():
return str(current)
if current in stop_root_set:
break
current = current.parent
return ""
def discover_targets(roots: list[Path]) -> list[dict]:
results: dict[Path, dict] = {}
now = time.time()
for root in roots:
for current, dirnames, _filenames in os.walk(root, topdown=True):
if "target" in dirnames:
target_dir = (Path(current) / "target").resolve()
dirnames.remove("target")
if target_dir in results or not target_dir.is_dir():
continue
stat_result = target_dir.stat()
size_bytes = du_size_bytes(target_dir)
age_days = int((now - stat_result.st_mtime) // 86400)
results[target_dir] = {
"path": str(target_dir),
"size_bytes": size_bytes,
"size_human": human_size(size_bytes),
"age_days": age_days,
"workspace": nearest_cargo_root(target_dir, roots),
}
return sorted(results.values(), key=lambda item: item["size_bytes"], reverse=True)
def print_table(rows: list[dict]) -> None:
if not rows:
print("No matching Rust target directories found.")
return
size_width = max(len(row["size_human"]) for row in rows)
age_width = max(len(str(row["age_days"])) for row in rows)
print(
f"{'SIZE'.ljust(size_width)} {'AGE'.rjust(age_width)} PATH"
)
for row in rows:
print(
f"{row['size_human'].ljust(size_width)} "
f"{str(row['age_days']).rjust(age_width)}d "
f"{row['path']}"
)
def filter_rows(rows: list[dict], min_size: int, older_than: int | None, limit: int | None) -> list[dict]:
filtered = [row for row in rows if row["size_bytes"] >= min_size]
if older_than is not None:
filtered = [row for row in filtered if row["age_days"] >= older_than]
if limit is not None:
filtered = filtered[:limit]
return filtered
def cmd_list(args: argparse.Namespace) -> int:
roots = load_roots(Path(args.roots_file).expanduser(), args.root)
if not roots:
print("No scan roots available.", file=sys.stderr)
return 1
rows = discover_targets(roots)
rows = filter_rows(rows, parse_size(args.min_size), args.older_than, args.limit)
if args.output == "json":
print(json.dumps(rows, indent=2))
elif args.output == "tsv":
for row in rows:
print(
"\t".join(
[
str(row["size_bytes"]),
str(row["age_days"]),
row["path"],
row["workspace"],
]
)
)
elif args.output == "paths":
for row in rows:
print(row["path"])
else:
print_table(rows)
return 0
def validate_delete_path(path_text: str, roots: list[Path]) -> Path:
target = Path(path_text).expanduser().resolve(strict=True)
if target.name != "target":
raise ValueError(f"{target} is not a target directory")
if target.is_symlink():
raise ValueError(f"{target} is a symlink")
if not target.is_dir():
raise ValueError(f"{target} is not a directory")
if not any(is_relative_to(target, root) for root in roots):
raise ValueError(f"{target} is outside configured scan roots")
if nearest_cargo_root(target, roots) == "":
raise ValueError(f"{target} is not beneath a Cargo project")
return target
def cmd_delete(args: argparse.Namespace) -> int:
roots = load_roots(Path(args.roots_file).expanduser(), args.root)
if not roots:
print("No scan roots available.", file=sys.stderr)
return 1
targets: list[Path] = []
for raw_path in args.path:
try:
targets.append(validate_delete_path(raw_path, roots))
except ValueError as exc:
print(str(exc), file=sys.stderr)
return 1
total_size = sum(du_size_bytes(target) for target in targets)
print(f"Matched {len(targets)} target directories totaling {human_size(total_size)}:")
for target in targets:
print(str(target))
if not args.yes:
print("Dry run only. Re-run with --yes to delete these target directories.")
return 0
for target in targets:
shutil.rmtree(target)
print(f"Deleted {len(targets)} target directories.")
return 0
def build_parser() -> argparse.ArgumentParser:
parser = argparse.ArgumentParser(
description="Inventory and delete Rust target directories under configured roots."
)
parser.add_argument(
"--roots-file",
default=str(DEFAULT_ROOTS_FILE),
help="Path to the newline-delimited root list.",
)
parser.add_argument(
"--root",
action="append",
default=[],
help="Additional root to scan. May be provided multiple times.",
)
subparsers = parser.add_subparsers(dest="command", required=True)
list_parser = subparsers.add_parser("list", help="List target directories.")
list_parser.add_argument("--min-size", default="0", help="Minimum size threshold, for example 500M or 2G.")
list_parser.add_argument("--older-than", type=int, help="Only include targets at least this many days old.")
list_parser.add_argument("--limit", type=int, help="Maximum number of rows to print.")
list_parser.add_argument(
"--output",
choices=["table", "tsv", "json", "paths"],
default="table",
help="Output format.",
)
list_parser.set_defaults(func=cmd_list)
delete_parser = subparsers.add_parser("delete", help="Delete explicit target directories.")
delete_parser.add_argument("path", nargs="+", help="One or more target directories to delete.")
delete_parser.add_argument("--yes", action="store_true", help="Actually delete the paths.")
delete_parser.set_defaults(func=cmd_delete)
return parser
def main() -> int:
parser = build_parser()
args = parser.parse_args()
return args.func(args)
if __name__ == "__main__":
raise SystemExit(main())