gremlin.

> space-investigator.

Disk space analysis. Point it at a directory, get the biggest files and directories back. Written in Rust, walks filesystems in parallel.

Rust MIT License Linux / macOS / Windows

Where this came from

I used to work at Rackspace, over a decade ago now. The senior engineers there kept an internal wiki of useful scripts, and over a few years it turned into this massive collection of one-liners for every situation you'd run into on a customer's server.

Some of the scripts needed extra tools the original author had installed. You'd copy a command, paste it on a box, and it'd fail because ncdu or sar or some other utility wasn't there. So you'd end up with multiple versions of the same script, slightly different depending on which tools the author preferred. Some people focused on writing versions that only used tools from the base kickstart install so they'd work on any box, which got you stuff like this:

The original
FS='./';resize;clear;date;df -h $FS; echo "Largest Directories:"; \
nice -n19 find $FS -mount -type d -print0 2>/dev/null|xargs -0 du -k| \
sort -runk1|head -n20|awk -F'\t' '{printf "%8d MB\t%s\n",($1/1024),$NF}'; \
echo "Largest Files:"; nice -n 19 find $FS -mount -type f -print0 2>/dev/null| \
xargs -0 du -k | sort -rnk1| head -n20 | \
awk -F'\t' '{printf "%8d MB\t%s\n",($1/1024),$NF}';

It works. Nobody is typing that from memory though. The knowledge base was probably the most visited page on our wiki, everyone had it bookmarked just for commands like this.

The other problem: on a filesystem with thousands of small files, it'd take forever. Single-threaded find piped to du, waiting, waiting.

So this is a Rust rewrite of that snippet. Same job, but parallel, and you just type si /path.

Features

Filesystem info
Total, used, and available space for the target mount, like df -h.
Parallel walking
Uses jwalk to traverse directories across multiple threads.
Mount boundaries
Won't cross into other filesystems. Same as find -mount.
JSON output
Pass --json to get structured output you can feed into jq or other tools.

Install

Download a binary
# grab the latest release for your platform
$ curl -LO https://github.com/gremlinltd/space-investigator/releases/latest/download/si-<target>.tar.gz
$ tar xzf si-*.tar.gz
$ ./si --help
Build from source
$ git clone https://github.com/gremlinltd/space-investigator.git
$ cd space-investigator
$ cargo install --path .

Running it

$ si [OPTIONS] [PATH]
FlagWhat it doesDefault
PATHDirectory to scan.
-d, --dirs <N>Top directories to show20
-f, --files <N>Top files to show20
-j, --jsonOutput JSONoff

Examples

# scan current directory
$ si

# scan /var, show top 10 of each
$ si /var -d 10 -f 10

# JSON output, pipe to jq
$ si /home --json | jq '.largest_files[:5]'

Structured output

The --json flag gives you everything in a machine-readable format:

{
  "timestamp": "2026-04-09T20:57:03+03:00",
  "path": "/home",
  "filesystem": {
    "name": "Macintosh HD",
    "total_bytes": 1995218165760,
    "used_bytes": 1241711876544,
    "available_bytes": 753506289216,
    "use_percent": 62
  },
  "largest_directories": [
    { "path": "/home/user/projects", "size_bytes": 5368709120, "size_mb": 5120 }
  ],
  "largest_files": [
    { "path": "/home/user/backup.tar.gz", "size_bytes": 2147483648, "size_mb": 2048 }
  ]
}