diff --git a/dotfiles/agents/skills/disk-space-cleanup/SKILL.md b/dotfiles/agents/skills/disk-space-cleanup/SKILL.md index 6f6460ce..68cbbba6 100644 --- a/dotfiles/agents/skills/disk-space-cleanup/SKILL.md +++ b/dotfiles/agents/skills/disk-space-cleanup/SKILL.md @@ -7,6 +7,12 @@ description: Investigate and safely reclaim disk space on this machine, especial Reclaim disk space with a safety-first workflow: investigate first, run obvious low-risk cleanup wins, then do targeted analysis for larger opportunities. +Bundled helpers: + +- `scripts/rust_target_dirs.py`: inventory and guarded deletion for explicit Rust `target/` directories +- `references/rust-target-roots.txt`: machine-specific roots for Rust artifact scans +- `references/ignore-paths.md`: machine-specific excludes for `du`/`ncdu` + ## Execution Default - Start with non-destructive investigation and quick sizing. @@ -19,11 +25,11 @@ Reclaim disk space with a safety-first workflow: investigate first, run obvious 1. Establish current pressure and biggest filesystems 2. Run easy cleanup wins -3. Sweep Rust build artifacts in common project roots +3. Inventory Rust build artifacts and clean the right kind of target 4. Investigate remaining heavy directories with `ncdu`/`du` 5. Investigate `/nix/store` roots when large toolchains still persist 6. Summarize reclaimed space and next candidate actions -7. Record new machine-specific ignore paths or cleanup patterns in this skill +7. Record new machine-specific ignore paths, Rust roots, or cleanup patterns in this skill ## Step 1: Baseline @@ -66,31 +72,46 @@ npm cache clean --force ## Step 3: Rust Build Artifact Cleanup -Target common roots first: `~/Projects` and `~/code`. +Do not start with a blind `find ~ -name target` or with hard-coded roots that may miss worktrees. Inventory explicit `target/` directories first using the bundled helper and the machine-specific root list in `references/rust-target-roots.txt`. -Use `cargo-sweep` in dry-run mode before deleting: +Inventory the biggest candidates: ```bash -nix run nixpkgs#cargo-sweep -- sweep -d -r -t 30 ~/Projects ~/code +python /home/imalison/dotfiles/dotfiles/agents/skills/disk-space-cleanup/scripts/rust_target_dirs.py list --min-size 500M --limit 30 ``` -Then perform deletion: +Focus on stale targets only: ```bash -nix run nixpkgs#cargo-sweep -- sweep -r -t 30 ~/Projects ~/code +python /home/imalison/dotfiles/dotfiles/agents/skills/disk-space-cleanup/scripts/rust_target_dirs.py list --min-size 1G --older-than 14 --output tsv ``` -Alternative for toolchain churn cleanup: +Use `cargo-sweep` when the repo is still active and you want age/toolchain-aware cleanup inside a workspace: ```bash -nix run nixpkgs#cargo-sweep -- sweep -r -i ~/Projects ~/code +nix run nixpkgs#cargo-sweep -- sweep -d -r -t 30 +nix run nixpkgs#cargo-sweep -- sweep -r -t 30 +nix run nixpkgs#cargo-sweep -- sweep -d -r -i +nix run nixpkgs#cargo-sweep -- sweep -r -i +``` + +Use direct `target/` deletion when inventory shows a discrete stale directory, especially for inactive repos or project-local worktrees. The helper only deletes explicit paths named `target` that are beneath configured roots and a Cargo project: + +```bash +python /home/imalison/dotfiles/dotfiles/agents/skills/disk-space-cleanup/scripts/rust_target_dirs.py delete /abs/path/to/target +python /home/imalison/dotfiles/dotfiles/agents/skills/disk-space-cleanup/scripts/rust_target_dirs.py delete /abs/path/to/target --yes ``` Recommended sequence: -1. Run `-t 30` first for age-based stale builds. -2. Run a dry-run with `-i` next. -3. Apply `-i` when dry-run shows significant reclaimable space. +1. Run `rust_target_dirs.py list` to see the largest `target/` directories across `~/Projects`, `~/org`, `~/dotfiles`, and other configured roots. +2. For active repos, prefer `cargo-sweep` from the workspace root. +3. For inactive repos, abandoned branches, and `.worktrees/*/target`, prefer guarded direct deletion of the explicit `target/` directory. +4. Re-run the list command after each deletion round to show reclaimed space. + +Machine-specific note: + +- Project-local `.worktrees/*/target` directories are common cleanup wins on this machine and are easy to miss with the old hard-coded workflow. ## Step 4: Investigation with `ncdu` and `du` @@ -159,6 +180,7 @@ nix why-depends Common retention pattern on this machine: - Many `.direnv/flake-profile-*` symlinks under `~/Projects` and worktrees keep `nix-shell-env`/`ghc-shell-*` roots alive. +- Old taffybar constellation repos under `~/Projects` can pin large Haskell closures through `.direnv` and `result` symlinks. Deleting `gtk-sni-tray`, `status-notifier-item`, `dbus-menu`, `dbus-hslogger`, and `gtk-strut` and then rerunning `nix-collect-garbage -d` reclaimed about 11G of store data in one validated run. - `find_store_path_gc_roots` is especially useful for proving GHC retention: many large `ghc-9.10.3-with-packages` paths are unique per project, while the base `ghc-9.10.3` and docs paths are shared. - Quantify before acting: @@ -177,6 +199,7 @@ nix-store --gc --print-roots | rg '/\\.direnv/flake-profile-' | awk -F' -> ' '{p - Do not delete user files directly unless explicitly requested. - Prefer cleanup tools that understand ownership/metadata (`nix`, `docker`, `podman`, `cargo-sweep`) over `rm -rf`. +- For Rust build artifacts, deleting an explicit directory literally named `target` is acceptable when it is discovered by the bundled helper; Cargo will rebuild it. - Present a concise “proposed actions” list before high-impact deletes. - If uncertain whether data is needed, stop at investigation and ask. @@ -187,5 +210,6 @@ Treat this skill as a living playbook. After each disk cleanup task: 1. Add newly discovered mountpoints or directories to ignore in `references/ignore-paths.md`. -2. Add validated command patterns or caveats discovered during the run to this `SKILL.md`. -3. Keep instructions practical and machine-specific; remove stale guidance. +2. Add newly discovered Rust repo roots in `references/rust-target-roots.txt`. +3. Add validated command patterns or caveats discovered during the run to this `SKILL.md`. +4. Keep instructions practical and machine-specific; remove stale guidance. diff --git a/dotfiles/agents/skills/disk-space-cleanup/references/rust-target-roots.txt b/dotfiles/agents/skills/disk-space-cleanup/references/rust-target-roots.txt new file mode 100644 index 00000000..ab809381 --- /dev/null +++ b/dotfiles/agents/skills/disk-space-cleanup/references/rust-target-roots.txt @@ -0,0 +1,6 @@ +# One absolute path per line. Comments are allowed. +# Keep this list machine-specific and update it when Rust repos move. + +/home/imalison/Projects +/home/imalison/org +/home/imalison/dotfiles diff --git a/dotfiles/agents/skills/disk-space-cleanup/scripts/rust_target_dirs.py b/dotfiles/agents/skills/disk-space-cleanup/scripts/rust_target_dirs.py new file mode 100755 index 00000000..aa0f7ecf --- /dev/null +++ b/dotfiles/agents/skills/disk-space-cleanup/scripts/rust_target_dirs.py @@ -0,0 +1,271 @@ +#!/usr/bin/env python3 + +import argparse +import json +import os +import shutil +import subprocess +import sys +import time +from pathlib import Path + + +SCRIPT_DIR = Path(__file__).resolve().parent +DEFAULT_ROOTS_FILE = SCRIPT_DIR.parent / "references" / "rust-target-roots.txt" + + +def parse_size(value: str) -> int: + text = value.strip().upper() + units = { + "B": 1, + "K": 1024, + "KB": 1024, + "M": 1024**2, + "MB": 1024**2, + "G": 1024**3, + "GB": 1024**3, + "T": 1024**4, + "TB": 1024**4, + } + for suffix, multiplier in units.items(): + if text.endswith(suffix): + number = text[: -len(suffix)].strip() + return int(float(number) * multiplier) + return int(float(text)) + + +def human_size(num_bytes: int) -> str: + value = float(num_bytes) + for unit in ["B", "K", "M", "G", "T"]: + if value < 1024 or unit == "T": + if unit == "B": + return f"{int(value)}B" + return f"{value:.1f}{unit}" + value /= 1024 + return f"{num_bytes}B" + + +def is_relative_to(path: Path, root: Path) -> bool: + try: + path.relative_to(root) + return True + except ValueError: + return False + + +def load_roots(roots_file: Path, cli_roots: list[str]) -> list[Path]: + roots: list[Path] = [] + for raw in cli_roots: + candidate = Path(raw).expanduser().resolve() + if candidate.exists(): + roots.append(candidate) + + if roots_file.exists(): + for line in roots_file.read_text().splitlines(): + stripped = line.split("#", 1)[0].strip() + if not stripped: + continue + candidate = Path(stripped).expanduser().resolve() + if candidate.exists(): + roots.append(candidate) + + unique_roots: list[Path] = [] + seen: set[Path] = set() + for root in roots: + if root not in seen: + unique_roots.append(root) + seen.add(root) + return unique_roots + + +def du_size_bytes(path: Path) -> int: + result = subprocess.run( + ["du", "-sb", str(path)], + check=True, + capture_output=True, + text=True, + ) + return int(result.stdout.split()[0]) + + +def nearest_cargo_root(path: Path, stop_roots: list[Path]) -> str: + current = path.parent + stop_root_set = set(stop_roots) + while current != current.parent: + if (current / "Cargo.toml").exists(): + return str(current) + if current in stop_root_set: + break + current = current.parent + return "" + + +def discover_targets(roots: list[Path]) -> list[dict]: + results: dict[Path, dict] = {} + now = time.time() + for root in roots: + for current, dirnames, _filenames in os.walk(root, topdown=True): + if "target" in dirnames: + target_dir = (Path(current) / "target").resolve() + dirnames.remove("target") + if target_dir in results or not target_dir.is_dir(): + continue + stat_result = target_dir.stat() + size_bytes = du_size_bytes(target_dir) + age_days = int((now - stat_result.st_mtime) // 86400) + results[target_dir] = { + "path": str(target_dir), + "size_bytes": size_bytes, + "size_human": human_size(size_bytes), + "age_days": age_days, + "workspace": nearest_cargo_root(target_dir, roots), + } + return sorted(results.values(), key=lambda item: item["size_bytes"], reverse=True) + + +def print_table(rows: list[dict]) -> None: + if not rows: + print("No matching Rust target directories found.") + return + size_width = max(len(row["size_human"]) for row in rows) + age_width = max(len(str(row["age_days"])) for row in rows) + print( + f"{'SIZE'.ljust(size_width)} {'AGE'.rjust(age_width)} PATH" + ) + for row in rows: + print( + f"{row['size_human'].ljust(size_width)} " + f"{str(row['age_days']).rjust(age_width)}d " + f"{row['path']}" + ) + + +def filter_rows(rows: list[dict], min_size: int, older_than: int | None, limit: int | None) -> list[dict]: + filtered = [row for row in rows if row["size_bytes"] >= min_size] + if older_than is not None: + filtered = [row for row in filtered if row["age_days"] >= older_than] + if limit is not None: + filtered = filtered[:limit] + return filtered + + +def cmd_list(args: argparse.Namespace) -> int: + roots = load_roots(Path(args.roots_file).expanduser(), args.root) + if not roots: + print("No scan roots available.", file=sys.stderr) + return 1 + rows = discover_targets(roots) + rows = filter_rows(rows, parse_size(args.min_size), args.older_than, args.limit) + + if args.output == "json": + print(json.dumps(rows, indent=2)) + elif args.output == "tsv": + for row in rows: + print( + "\t".join( + [ + str(row["size_bytes"]), + str(row["age_days"]), + row["path"], + row["workspace"], + ] + ) + ) + elif args.output == "paths": + for row in rows: + print(row["path"]) + else: + print_table(rows) + return 0 + + +def validate_delete_path(path_text: str, roots: list[Path]) -> Path: + target = Path(path_text).expanduser().resolve(strict=True) + if target.name != "target": + raise ValueError(f"{target} is not a target directory") + if target.is_symlink(): + raise ValueError(f"{target} is a symlink") + if not target.is_dir(): + raise ValueError(f"{target} is not a directory") + if not any(is_relative_to(target, root) for root in roots): + raise ValueError(f"{target} is outside configured scan roots") + if nearest_cargo_root(target, roots) == "": + raise ValueError(f"{target} is not beneath a Cargo project") + return target + + +def cmd_delete(args: argparse.Namespace) -> int: + roots = load_roots(Path(args.roots_file).expanduser(), args.root) + if not roots: + print("No scan roots available.", file=sys.stderr) + return 1 + + targets: list[Path] = [] + for raw_path in args.path: + try: + targets.append(validate_delete_path(raw_path, roots)) + except ValueError as exc: + print(str(exc), file=sys.stderr) + return 1 + + total_size = sum(du_size_bytes(target) for target in targets) + print(f"Matched {len(targets)} target directories totaling {human_size(total_size)}:") + for target in targets: + print(str(target)) + + if not args.yes: + print("Dry run only. Re-run with --yes to delete these target directories.") + return 0 + + for target in targets: + shutil.rmtree(target) + print(f"Deleted {len(targets)} target directories.") + return 0 + + +def build_parser() -> argparse.ArgumentParser: + parser = argparse.ArgumentParser( + description="Inventory and delete Rust target directories under configured roots." + ) + parser.add_argument( + "--roots-file", + default=str(DEFAULT_ROOTS_FILE), + help="Path to the newline-delimited root list.", + ) + parser.add_argument( + "--root", + action="append", + default=[], + help="Additional root to scan. May be provided multiple times.", + ) + + subparsers = parser.add_subparsers(dest="command", required=True) + + list_parser = subparsers.add_parser("list", help="List target directories.") + list_parser.add_argument("--min-size", default="0", help="Minimum size threshold, for example 500M or 2G.") + list_parser.add_argument("--older-than", type=int, help="Only include targets at least this many days old.") + list_parser.add_argument("--limit", type=int, help="Maximum number of rows to print.") + list_parser.add_argument( + "--output", + choices=["table", "tsv", "json", "paths"], + default="table", + help="Output format.", + ) + list_parser.set_defaults(func=cmd_list) + + delete_parser = subparsers.add_parser("delete", help="Delete explicit target directories.") + delete_parser.add_argument("path", nargs="+", help="One or more target directories to delete.") + delete_parser.add_argument("--yes", action="store_true", help="Actually delete the paths.") + delete_parser.set_defaults(func=cmd_delete) + + return parser + + +def main() -> int: + parser = build_parser() + args = parser.parse_args() + return args.func(args) + + +if __name__ == "__main__": + raise SystemExit(main())