Disk space analysis. Point it at a directory, get the biggest files and directories back. Written in Rust, walks filesystems in parallel.
I used to work at Rackspace, over a decade ago now. The senior engineers there kept an internal wiki of useful scripts, and over a few years it turned into this massive collection of one-liners for every situation you'd run into on a customer's server.
Some of the scripts needed extra tools the original author had installed.
You'd copy a command, paste it on a box, and it'd fail because
ncdu or sar or some other utility wasn't
there. So you'd end up with multiple versions of the same script,
slightly different depending on which tools the author preferred.
Some people focused on writing versions that only used tools from
the base kickstart install so they'd work on any box, which got
you stuff like this:
FS='./';resize;clear;date;df -h $FS; echo "Largest Directories:"; \
nice -n19 find $FS -mount -type d -print0 2>/dev/null|xargs -0 du -k| \
sort -runk1|head -n20|awk -F'\t' '{printf "%8d MB\t%s\n",($1/1024),$NF}'; \
echo "Largest Files:"; nice -n 19 find $FS -mount -type f -print0 2>/dev/null| \
xargs -0 du -k | sort -rnk1| head -n20 | \
awk -F'\t' '{printf "%8d MB\t%s\n",($1/1024),$NF}';
It works. Nobody is typing that from memory though. The knowledge base was probably the most visited page on our wiki, everyone had it bookmarked just for commands like this.
The other problem: on a filesystem with thousands of small files, it'd
take forever. Single-threaded find piped to
du, waiting, waiting.
So this is a Rust rewrite of that snippet. Same job, but parallel, and
you just type si /path.
# grab the latest release for your platform
$ curl -LO https://github.com/gremlinltd/space-investigator/releases/latest/download/si-<target>.tar.gz
$ tar xzf si-*.tar.gz
$ ./si --help
$ git clone https://github.com/gremlinltd/space-investigator.git
$ cd space-investigator
$ cargo install --path .
$ si [OPTIONS] [PATH]
| Flag | What it does | Default |
|---|---|---|
PATH | Directory to scan | . |
-d, --dirs <N> | Top directories to show | 20 |
-f, --files <N> | Top files to show | 20 |
-j, --json | Output JSON | off |
# scan current directory
$ si
# scan /var, show top 10 of each
$ si /var -d 10 -f 10
# JSON output, pipe to jq
$ si /home --json | jq '.largest_files[:5]'
The --json flag gives you everything in a machine-readable format:
{
"timestamp": "2026-04-09T20:57:03+03:00",
"path": "/home",
"filesystem": {
"name": "Macintosh HD",
"total_bytes": 1995218165760,
"used_bytes": 1241711876544,
"available_bytes": 753506289216,
"use_percent": 62
},
"largest_directories": [
{ "path": "/home/user/projects", "size_bytes": 5368709120, "size_mb": 5120 }
],
"largest_files": [
{ "path": "/home/user/backup.tar.gz", "size_bytes": 2147483648, "size_mb": 2048 }
]
}