What do shells do, and do we need them?
Diving into what shells do, and an introduction to unsh, the unshell.
Recently, I wrote about Evcxr, a REPL and Jupyter notebook kernel for Rust. In that post, I made the assertion that shells are basically REPLs, and that we should replace REPLs with notebooks. I think that’s true, but I still think there is a place for something like a shell, albeit a very restricted one.
Like any technology that has been around for decades, shells have a lot of features and responsibilities that were gained through gradual accretion. If a shell does almost everything you want except for this one thing I need right now, there’s a heavy temptation to add that one thing and not reinvent everything from scratch. Do that long enough and you end up with some things that a shell does well, and lots of things it does less well but, you know, well enough.
Some people have looked at the deficiencies of the shell1 and thought “How do we make this better?” Both fish shell and nushell are attempts to take what shells do and improve on it by making the shell more capable and its scripting language less crufty. In contrast, I think we should strip away responsibilities from the shell and hand them off to tools that are better suited.
Let’s take a look at what shells do, and see if we can tease apart what their strengths are and where other tools have come along and supplanted them. Then we can come back and see what’s left and see if there’s something better to be built. It’s usually a good idea to learn what something is doing before tearing it down.
What capabilities do shells provide?
A way to spawn processes and pass startup options
In functional languages like Haskell, applying a function to its arguments is the most common and important operation you’ll do, so the syntax is very lightweight. No parentheses, no commas, just spaces between the function and its arguments. f a b
is the syntax for calling a function, which is equivalent to f(a, b)
in most imperative languages2. Removing that little bit of syntactic noise is worth it because function application is so central.
Likewise in a shell, spawning a process and passing it options is the central activity. Compare the way we invoke commands in a shell:
$ echo hello world
with how we call a process in Rust:
std::process::Command("echo").args(&["hello", "world"]).spawn();
Shells have reduced their overall syntactic flexibility in order to make spawning processes maximally convenient. Simple things like arithmetic and if/then conditions are syntactically very noisy in shell scripts, but spawning processes and passing it argv
is about as easy as it gets.
A way to navigate the filesystem
In a shell, using cd
and ls
are about the most common things I do. Specifically, cd
is so common we might not often pause to consider that usually it’s a shell builtin and not a separate executable on the path. If you write a naive cd
program and invoke it in a shell, it’ll just change the directory of the subprocess and then exit, leaving the shell’s current directory unchanged.
I have a graphical file explorer on my system, but I don’t often use it. When I want to navigate somewhere on the filesystem I usually want to invoke some commands in that directory immediately afterward. It’s much easier for me to just cd some/directory
than to click my way there in a GUI.
A default user interface for simple programs
Don’t want to think about your user interface? Just output text. Most languages3 make writing to stdout
very easy, so it’s a very low friction way to allow your program to communicate with humans. For many programs that don’t need heavy interaction with the user, you accept arguments through the command line and output to stdout
.
Imagine needing to construct a GUI text box in order to display output to a user. That would introduce a huge amount of overhead to creating small programs and testing things out. Instead the shell provides the default UI: text scrolls up as the program executes, which the user can scroll back through if they want. Then the shell returns to the prompt when the program exits.
There are downsides to this universal interface: it relies on the user having a gigantic library installed in their brain already: the entire human language your program is written in plus the ability to read the written form of that language. Most everyone can drag a slider or click a button, but a text interface is completely useless to someone who doesn’t know the language or how to read.
A way to connect the output of one program to the input of another
Besides spawning processes, this is the second thing the shell makes delightfully easy. The boundary between the pipe character and shell scripting language itself is a bit blurry, but it’s worth calling it out separately because all by itself it makes shells significantly more capable even without other programming language features like variables, loops and conditions.
It works really well because it turns out it’s the same thing as function composition. There are entire categories of math devoted to how much power you get just by composing functions. To make it explicit, here’s three things that are alternative syntax for the same thing:
cat /dev/urandom | head -c 200 | base64
base64(head(cat(“/dev/urandom”), 200))
cat(“/dev/urandom”).head(characters=200).base64()
Technically, processes return a second bit of information, the exit code, which makes this composition a little fiddly. But the basics are the same.
A resizable 2D grid that can display characters in color
This is getting into the border between the shell and the terminal emulator. Really it’s the terminal emulator that interprets various escape sequences and displays color and implements raw mode which enables writing to arbitrary character locations. But if we are willing to conflate shell and terminal emulator momentarily, command lines have the capability to create rudimentary graphical interfaces.
Sure we have to use some weird box-drawing unicode characters, but the limited graphical capabilities on par at least with xterm are expected by many common tools. I mean, heck, even systemd status
uses color to indicate the status of services:
A simple programming environment
After piping, there are some basic programming features like variables, for loops, if statements etc. This is what tempts you to solve problems in a shell script rather than back up and realize you should switch over to solving the problem with a general purpose programming language.
The friction is so low though! Maybe by the time you’ve crafted half of the solution to your problem by piping shell commands together, there’s a little bit of sunk cost calculation going on. Do I really want to rewrite this entire thing over again in a completely different language, or should I just go look up the difference between [
and [[
one more time and bang this thing out?
Personally, I think this is the original sin of the shell. It gives you just enough rope to hang yourself. Once you’ve solved your problem in a shell script, doesn’t it make sense to just commit that as your code? And once you’re committing your shell scripts, doesn’t it make sense to add a linter like ShellCheck to your CI pipeline? This is the point at which the slope begins to slip.
A persistent global history buffer of previous commands
This might seem like a minor point, but I regularly use my shell command history as a very limited second brain. Especially with fancy history autocompletion, this becomes a really powerful way to keep arcane program invocations stored… somewhere. And, in contrast to curating and storing your useful commands in a wiki or something (probably copy/pasting it back and forth), searching the command history automatically puts the command exactly where it needs to be to execute it immediately.
A lowest common denominator execution environment
Because it’s always available on *nix systems, shell scripts offer a language to bootstrap other features from. MacOS won’t execute ELF binaries, but if you wrap your downloader in a shell script like rustup does, you can intelligently interrogate the host system and download the appropriate binary file for it. This makes distributing tools like rustup much easier.
Build systems commonly use this strategy to decide which tools are available to execute with. For example, autotools is essentially a gigantic shell script that checks the capabilities of a UNIX system extensively in order to determine the best way to compile a C program.
What are shells to a human?
So far I’ve talked about a bunch of technical capabilities and features of shells, but one layer above that is what roles shells play at a human level. By that I mean, what is a human trying to do when they reach for a shell? If there are two equivalent ways to do something, when do we choose to use a shell to do it?
A place to execute non-GUI programs
You could make the argument that the dividing line between an app and a program is whether you invoke it from the shell or not. Programs are available on your $PATH
, not your desktop. They’re words you memorize and type out, not icons you click. They’re intended to be tools used in combination with other tools, not self-contained applications. The shell is how you access and configure programs like this.
A sense of location
When you cd into a directory, you are navigating there mentally. This is repurposing the brain’s well developed spatial awareness. From the computer’s point of view, absolute paths and relative paths are equivalent. From a human’s point of view, relative paths are a natural way of specifying directions that originate from a particular starting point.
You can see this effect in tools like git
, which looks at the current directory to decide which project you’re working on. When non-technical users interact with computers, they usually don’t think of themselves being in a particular place in the filesystem. The location is either the desktop or an app. When you’re in the shell, you’re wherever your current directory is.
A place to tinker and figure things out
This is enabled by the REPL nature of the shell, and the interaction between the UNIX philosophies of everything is a file and the capability of piping data around. When Brett Victor talks about visual feedback and showing the data being more important than showing the code, it becomes obvious that the shell provides a crude form of that.
The arcane invocation you’re building up piece by piece may be difficult to understand by reading it, but that’s because you’re not reading the code, you’re running the code and seeing what it outputs. The process is to incrementally build the code to get the answer you’re looking for.
A stylized workflow in this mode is something like:
Do a basic query, see what the output looks like
Pipe the output into some program like
awk
orjq
and start digging for the piece of the output you needUse
Ctrl+R
and ⬆️ to modify the previous version incrementally and re-executeOnce the query portion is what you expect, simulate the write step by adding an echo that prints what command would be invoked
When you’re comfortable with that, swap out the echo for the actual write operation
Walk away. Don’t commit the result to source control or refactor, this is ephemeral write-only code for a one-off job
A place for hacker aesthetic
I’d be remiss if I didn’t mention that things running in the shell have a certain aesthetic that many developers find pleasing. This demo for the terminal UI kit tui-rs is undeniably cool:
Why is that? If someone made the same interface in a GUI toolkit, it wouldn’t look nearly as cool to me. I say this as someone who has used a shell daily for almost two decades, the novelty somehow hasn’t worn off.
If you want to build something with a terminal UI like this, your target audience is necessarily programmers and other technical users. Non-technical users are not going to put up with this stuff. The barrier to entry the shell puts up means programmers can have their dorky tools and make them the way they want, without anyone complaining that it looks ugly.
This also functions as a kind of costly signal: if a tool has an intricate terminal UI, it’s a good bet the creator of the tool cares about programmers. The code needed to write a terminal UI is just as complex as the code needed to output a basic html interface or use a standard OS GUI library. The difference is purely in the rendering layer, and the implicit message is “I’m rendering this for you, fellow hacker”
What replaces the shell?
This is a big topic and there are lots of answers, but I think we can point out some general replacements for aspects of what the shell does
Terminal UIs become browser interfaces
This trend is already happening. Terminals can be pressed into service to do graphical output, but human input in a terminal interface is abysmal. Browsers do input and output about equally well (that is to say: mediocre, but not badly).
Explore and tinker in a programming language’s REPL or code notebook
I mentioned this in the previous post: I’m a fan of the notebook concept over REPLs. But even a REPL in a good language is better than a REPL in a bad language (shell script).
Use IDEs or code notebooks to provide task-oriented location
IDEs also provide the metaphor of “project directory is where you are”, and code notebooks like jupyter do as well. You can change the current working directory in a notebook, but it rarely seems necessary because the implicit assumption is that the notebook file itself is in the directory you want to be in.
Minimize serialization logic and lean more heavily on library interfaces
This is a larger topic than I want to get into here, but the basic idea is that I think the UNIX toolbelt approach of making small, single responsibility tools that interact by piping plaintext to each other was a really good idea in the 70s, but we understand the problem domain better now and have myriad solutions for it.
Fundamentally, small tools communicating through plaintext has a few weaknesses:
It makes serialization, de-serialization and validation code a larger proportion of what your program does. It also means you lose much of the benefits of a language like Rust with strong typing. Serialization and de-serialization are the boundaries where the type checker can no longer help you, so you want to minimize those, not maximize them.
For a tool to really be usable in this way, you need to commit to backwards compatibility. This means tools have a limited ability to evolve before they become ossified in order to prevent breaking things.
awk
, tr
, and cut
are all programs for parsing ad-hoc underspecified serialization formats. Let’s do away with the need for them by using the much richer interfaces provided by modern programming languages’ type systems and the tooling they have for expressive and flexible serde.
The alternative to small focused binaries communicating through plaintext is small focused libraries with richly typed interfaces.
What’s left for the shell?
I think this leaves the shell with:
the default interface for simple programs
a place to spawn processes and provide configuration arguments
a cross-platform bootstrap environment
I hope that as WASM+WASI gains traction, it can take over the cross-platform bootstrapping capabilities. Or possibly something exotic like Actually Portable Executables will take root. But I don’t think shell scripts are a good long term solution for “write once, run anywhere”.
Unleashing Unsh
That leaves the default interface and process spawning responsibilities. To that end, I’ve written unsh (short for Un-shell). All it does is provide a loop where you can execute programs, provide them with an argv, and see what their output is:
It doesn’t have a programming language built in, and it doesn’t allow piping output:
Is unsh
useful? Well, sure if you need those two things, and nothing else. It’s designed to hint strongly to the user that if they want something more complicated, they should probably use a real programming language.
If you want to be hardcore, try switching your login shell to unsh and see how it goes. I think people may be surprised at how far simple process launching gets you, even without the rest of a shell’s capabilities.
Throughout I’m talking mostly about POSIX shells. Some of what I’m talking about is also applicable to PowerShell, but I haven’t been careful to call those places out (mostly because of my lack of familiarity)
This function call style seems to have been popularized by Fortran, and maybe even earlier by Plankalk, a language used in 1948. Though there’s some convergent evolution going on since mathematics used the f(a, b)
syntax before computers were around. If anyone knows of a comprehensive history of this syntax, let me know!
For some reason it always irked me that Java required one to do System.out.println
just to put a string on the console.