Bicameral, Not Homoiconic

Bicameral, Not Homoiconic🔗

1 (Weak) Homoiconicity

2 (Strong) Homoiconicity

3 The Parsing Pipeline

4 The Bicameral Analogy

5 Bicameral Syntax

6 Back to Lisps

7 What About Other Languages?

If you spend enough time reading internet discussions of programming languages, you’ll learn that Lispy languages have a special property: they are homoiconic. This property is vested with mystical powers that both enrich Lisps and debase its competitors.

I have programmed in, and built, Lisps since the late 1980s. My blog is called “parenthetically speaking”. And yet I’m here to tell you that this term is mostly nonsense. However, there is something special—something far less mystical, but also very powerful and useful—about Lisps. It’s worth understanding what that is and transporting its essence to other languages.

1 (Weak) Homoiconicity🔗

What, supposedly, is homoiconicity? You will hear things like: the property that “a property of some programming languages that allows programs to be represented as data within the language”, or with “represented” substituted by “manipulated”, or more simply as “code as data”.

Let’s tease these apart a bit. Consider the following Python code:

hello = 1

This is clearly a program. But can I represent this as a datum within the language? Sure:

'hello = 1'

is a perfectly good representation. (Well, it may be good but it’s not great; we’ll return to that!) Can I manipulate it? Sure, I can concatenate strings to create it:

'hello' + ' = ' + '1'

will produce that program, and

'hello = 1'.split(' ')

will take it apart into constituent pieces.

Does that make Python homoiconic?

Of course, there’s nothing special about Python here. We can use JavaScript to represent and manipulate JavaScript programs, C to do the same to C programs, and so on. Essentially, any programming language with a string datatype seems to be homoiconic. Heck, we didn’t even need strings: we could just as well have represented the programs as numbers (e.g., using Gödel numbering).

One of the traits of a good definition is that it be non-trivial: it must capture some things but it must also exclude some things. It’s not clear that this notion of homoiconicity excludes much of anything.

2 (Strong) Homoiconicity🔗

But there’s a reasonable objection to what we wrote above. All that we’ve done is written, combined, and taken apart strings. But strings are not necessarily programs; strings are just strings, a form of data. Data are data, but programs—entities that we can run—seem to be a separate thing.

How do we turn data into programs? We do need some language support for that. We need something that will take some agreed-upon data representation of a program and treat it like a program, i.e., do whatever the program would have done. Typically, this is a function called eval: it evaluates the datum, performing the effects described in the datum, just as if it were a program. (Note that eval really treats “data as code”, not “code as data”.)

So maybe eval is the real characteristic of homoiconic languages? Maybe. It’s certainly true that eval is a distinctive feature, and some languages have it while others don’t: that is, it non-trivially distinguishes between languages. But it’s worth noting:

Many languages, including Python and JavaScript, have an eval. If they’re all homoiconic, then clearly this isn’t a particularly Lispy trait.
eval interacts poorly with its lexical environment, thereby making it hard to even program with effectively. We showed that JavaScript’s eval is not one but four operations and there are eight contexts that determine which of the four to use. This kind of complexity is overwhelming.
The complexity might be worth it if eval were a good idea, but it’s often a bad idea in programs! It makes code statically invisible, making every other aspect of program management—static analysis, compilation, security checking, and more—much, much harder (or, for some important and useful kinds of analysis, impossible).

This seems like a disappointing way to end: homoiconic languages are ones that have a complex, excessively-powerful feature that we probably shouldn’t use but is anyway found in many languages that are not Lispy at all…which certainly doesn’t seem to be a good way to describe what makes Lispy languages distinctive.

But this just shows why we shouldn’t be talking about homoiconicity at all. Let’s talk about what’s actually interesting instead.

3 The Parsing Pipeline🔗

Let’s talk briefly about the classical parsing pipeline. For decades, we’ve been taught to think of parsing a program as having two phases: tokenization (sometimes colloquially called “lexing”) followed by parsing (not colloquially called “yaccing”), and more. What do they do?

A program is a list, or stream, of characters. A language implementation needs to first understand what a program is saying before it can make the program do what it says. That is, it has to understand the meaning of a program. This process of determining meaning is usually done in several stages, which make sense once we understand them in terms of their expressiveness.

A typical dataflow in the front-end of a language implementation looks like this:

What does this mean?

Just as you don’t read this article at the level of individual letters, given a program like hello = 1, we don’t want to deal with it at the level of each letter: h, e, l, … either. Instead, we’d like to start by thinking of it as having three tokens: hello, =, and 1. Note that in the process, we can ignore whether or not the = was surrounded by spaces (which Python does not require, but some languages do).We’ll return to the elided spaces later! This is the job of the scanner.

So the input list or stream of characters gets transformed, by the scanner, into an output list or stream of tokens. But as a reader, you still don’t really read the program at the level of tokens; instead, you assign meaning. That is, you presumably read the above as “assign hello to 1”.The actual meaning of = in Python is much more messy. This is the job of the parser: to create a data structure representing the high-level semantics of the language. In this case, it might create an abstract syntax tree of a “variable assignment”-type node with two children, one the variable being assigned (here, hello) and one the value expression (here, just 1). If the actual expression were more complicated—say the program were hello = 1 + 2—then there would be a more complicated tree representing addition with two children.This being Python, of course the actual meaning of + is also messy.

The exact mechanisms by which this form of parsing happens is somewhat outside scope here. For many languages, grammars are ambiguous and top-down parsing strategies—the ones you might first think of—don’t work, and parsing must instead happen bottom-up. This process is complicated, algorithmically fascinating, and often bad at giving good errors. There are very interesting trade-offs here that we often don’t pay enough attention to.

Not every language implementation follows the above pipeline. People have, for instance, experimented with parsing without using a scanner. Both conceptually and practically, it is still useful to have this separation:

The scanner usually sits at the lowest level of the Chomsky hierarchy: it is regular. It is therefore quick and efficient and its rules are easy to reason about in depth.
The parser, in contrast, is usually context-free or worse, making it a much more complicated beast.

We should want to take advantage of complexity theory in building our tools! This also represents a good separation-of-concerns, whereby we could imagine building multiple parsers—for instance, trading off speed or space (or both) for better error messages—while reusing the same scanner.

In what follows, therefore, I will take the scanner as a given. Thus, the traditional pipeline then has just one stage: the parser. I therefore call this the unicameral pipeline.

4 The Bicameral Analogy🔗

Many countries have what is called a bicameral legislature: “bi” meaning two, “camera” meaning room. That is, there are two houses of parliament. India, for instance, has the Lok Sabha and Rajya Sabha; England has the House of Commons and House of Lords; the USA has the House of Representatives and Senate. In all cases, there is a “lower” house and “upper” house.

My goal here is not to endorse various class-related associations or to delve into the actual practices, but rather to employ this as a useful, rough analogy. A theory of bicamerality is this. The lower house is a bit more rowdy and boisterous. It filters out some of the more absurd legislations, but it allows through many that are perhaps not entirely wise. The upper house is more calm and deliberative; in legislatures, its members are either appointed or serve longer terms, so they do not need to bend to every populist will. Therefore, the upper house usefully serves as a filter.

So: we have two houses. The lower house filters out dreck, but lets through several bills, some of which are unwise. The upper house passes judgment on these bills, allowing only the ones it considers wise. Enough political science, back to computer science.

5 Bicameral Syntax🔗

As I’ve alluded above, having just a parser gives you a unicameral syntax. What Lispy languages give you is a bicameral syntax. But they are not unique in this; other notations, such as XML and JSON, are also bicameral. To make my point, I will use these other notations.

First of all, all these notations have to get through a scanner. In XML, you can’t write <title without a closing >; that’s just not even a valid opening tag. In JSON, you can’t write "hi without a closing "; that’s just not a valid string. We can think of these as basic tokenization failures. So let’s assume our input passes the scanner and we get a list or stream of tokens.

Even once you’ve written proper tokens, there are still things you cannot do in XML or JSON. For instance, the following full document is not legal in XML:

because we didn’t “close” <foo> with a </foo>; this isn’t legal either,

because we didn’t preserve the nesting structure (we should close bar before we close foo). The same is true in JSON: we can’t open a structure without closing it (or vice versa); similarly, we have to close braces and brackets in the opposite order of how we opened them.

Observe that we can make these judgments without knowing what the datum represents. There’s a reason I used names like foo and bar instead of something meaningful. That’s because it doesn’t matter: these XML and JSON data are wrong at a basic level. We say that they fail to be well-formed. This is our lower house.

However, there may be more complicated rules at play. It may be that a bar really should not reside within a foo; it may be that every baz requires one or more quuxes. If you’ve ever worked with an XML or JSON format, you know that there are all kinds of rules like this (that may or may not have been documented well!) that you need to follow. For instance, when you read the documentation of a Web API, it might tell you that a certain kind of request needs to provide certain fields, but must also not provide certain other keys. All these are called rules of validity. They build on top of—indeed, leave unstated—the well-formedness rules, and focus on what wasn’t checked there. This is the upper house.

This leads to a new pipeline, one where some tool performs an intermediate check for well-formedness, and another for validity. The last stage—the one that checks validity and, having confirmed it, produces terms tagged with their “parts of speech”—is still the parser. But now there’s one more tool in the middle. I don’t believe there’s a standard name for it, but there should be; following Lisp tradition, I call it the reader:

Now, as written, it’s not clear what the output of the reader is. It could be that the reader just checks the tokens, and then passes on the same token list or stream to the parser, just as in the previous pipeline. But this would be a loss! The reader has the ability to construct an extremely useful intermediate data structure. The main property the reader looks for is that “things match up and nest properly”. In other words, the reader is confirming that the input is a tree. Thus, instead of passing on a dumb, flat list of tokens to the parser, it constructs trees.

This greatly simplifies the job of the parser for two reasons. At a more basic level, some of the checking that the parser previously had to do is already done. More interestingly, tokens are very low-level; trees are high-level. It’s much easier to write a recursive function that checks for validity and produces abstract syntax trees when already given well-formed trees as an input!For this reason, PLAI calls this the “Primus Inter Parsers”, a joke that seems lost on most.

Again, complexity theory is a useful guide, and what we have done is steadily ratchet up the complexity:

The scanner is still regular, with all the attendant benefits and weaknesses.
The reader is context-free. It does not need to do more than that.
The parser is now freed of basic context-free checks, and can focus on other context-free and, in particular, context-sensitive checks.

And once again, we can exploit this separation of concerns to swap out other implementations if we need to.

This, then, is a bicameral syntax. The reader is the “lower house”; it rejects basic mistakes, but still lets through some invalid things. But it also presents them wrapped up in a cleaner, simpler structure to work with. The parser is the “upper house”; it receives this clean structure and can look for deeper flaws in it, only allowing those things that pass its more exacting scrutiny.

The advantages to a bicameral syntax are many:

We get a greater separation of concerns, making it easier to both understand and to swap out components.
We get to more gradually walk up the complexity hierarchy.
The reader sits at a key intermediate level, turning tokens into trees, thereby greatly simplifying the parser’s task while itself being quite easy to write.

This simplicity of (correct!) implementation matters! It doesn’t matter as much when you have only one implementation of a language. However, a lot of our traffic in computing is best represented as trees: they are much more powerful than strings, but also simple enough to write and read. It is therefore no surprise that bicameral syntaxes like JSON have become ubiquitous. Every language now needs a reader for them, and these readers have to implement the same language, so simplicity of specification and of implementation are a virtue. In turn, many systems need to separately validate a JSON input; again, the simplicity of writing this kind of (tree-consuming) parser is a big win.

This additional level also has other tooling advantages. It’s not only language implementations that need to support syntax; so do editors. It’s a lot easier to support matching, indentation, coloring, and so on at the well-formedness level. And it’s easy to indent trees! It’s also easy to traverse them: there are meaningful generic tree-level operations (“go to the opening”, “go to the closing”, “go up a level”, “go down a level”, etc.) that can be applied to all languages with the same well-formedness characteristic (e.g., JSON). And again, we have lots of editors, and they each need to support the same syntaxes; bicameral languages, by providing the intermediate notion of well-formedness, let tools hit the trifecta of: correct, useful, and relatively easy.

These advantages are offset by one drawback: some people just don’t like them. It feels constraining to some to always write programs in terms of trees, rather than more free-form syntax. Even in this bicameral space, there are trade-offs: XML requires you to repeat tags at closing, which improves error-checking and catches errors in refactoring, but makes authoring painful; Lisps don’t require tag repetition but, traditionally, use only one kind of parenthesis, making structure harder to see; JSON mixes delimiters while avoiding closing tags, which seems to be some kind of sweet-spot (because it still provides some checking against editing errors). Still, what people are willing to embrace for writing data seems to irk them when writing programs, leading to the long-standing hatred for Lispy syntaxes.

6 Back to Lisps🔗

We started with Lisp, so let’s go back there. What is Lisp? Lisp is a feeling, an emotion, a sentiment; Lisp is a vibe; Lisp is the dew on morning grass, it’s the scent of pine wafting on a breeze, it’s the sound of a cricket ball on a bat, it’s the…oh, wait, where was I. Sorry.

Okay, seriously. Note that I keep writing “Lispy languages”: both the “y” and “[language]s” should have stood out by now. What do I mean by this?

I’m not referring to a particular language: Common Lisp, say. I’m referring to a family of languages, which includes Common Lisp but also Scheme and Racket and Clojure and many others. This family is bound by some cultural antecedents, but the individual languages can be wildly different. What they share is a philosophy of syntax. And now that we’ve fleshed out that philosophy, if we are being generous, we could allow that any bicameral language is at its essence “Lispy”.

The power of this philosophy is that the language platform can implement its bicameral syntax once, and well, and lots of languages can inherit it. We noted above that there are many (data) languages built atop JSON, each of which has its own validity rules. We can likewise build many data and programming languages atop any bicameral syntax; they will share some familial traits but differ quite a bit in the details.

The bicamerality of Lisps, however, leads to some misconceptions:

It does not dictate a particular implementation strategy. For instance, the argument in Lisp sometimes goes, “code is data because code is represented using s-expressions, which are also Lisp data”. A full tutorial of what these terms means is beyond what I want to get into here. What I will just say is that there are Lispy languages where this is not true at all.
The one I know best is Racket, because I originally built some of these parts of it. In Racket, code is not represented using s-expressions. This is because “code” requires many more things: it needs, for instance, to record the source locations so that it can use it to report errors;Remember those whitespaces we said we could ignore? We can’t if we want to report errors! it needs hygiene information to avoid inadvertent capture; and more. Therefore, Racket has a notion of syntax object (which, yes, is a datum) that captures this much richer notion of “program source”. Syntax is a separate type that happens to parallel (and enrich) s-expressions and can be interconverted with them, but it is not an s-expression. So, while there is a large family of functions that operate on (say) lists, and s-expressions include lists, none of those functions will apply to syntax.
People will sometimes say that the read primitive “parses”. It does not: it reads. It “parses” inasmuch as it confirms that the input is well-formed, but it is not the parser of convention—one that determines validity according to context-free and context-sensitive rules, and identifies “parts of speech”—so it is false to say that Lisps ship with a parser.
To make this concrete, here is a well-formed Lispy term that read has no problem with: (lambda 1). That is, however, a syntax error in most Lisps: a determination made by the parser, not the reader. Of course, nothing prevents us from creating a new language where that term has some meaning.Indeed, I now have. That language’s parser would be responsible for making sense out of the pieces.
People sometimes believe you can only have macros if you have a bicameral syntax. This isn’t true: all you need is a way to represent code, because macros transform code to code. But as more languages have realized the benefits of having macros, some have added procedural macros (an idea that dates back decades in Lisp lore). Bicameral syntaxes just provide a gateway to gloriously rich macro systems, because we’ve learned that the well-formedness layer is an especially good layer atop which to build macros.

7 What About Other Languages?🔗

I want to end with some forward-looking comments.

One of my frequent sources of amusement—and frustration—is to search for people asking “How do I build a programming language?” and reading the advice they get. These conversations invariably get mired in syntax issues. But by the time the novice gets through building all the knick-knacks of syntax and learning to build parsers, they’ve usually exhausted their complexity, cognitive, and time budget. The resulting language either never gets done or is a semantic mess.

I would like to suggest another path forward: start with a bicameral syntax. Quickly get to the point where you can think about semantics. What does your language do? How does one reason about programs? How does one run them? Do all the hard and interesting and important parts first.

But, you argue, “Now I have a bicameral syntax! Nobody will want to program in it!” And that may be true. But I want you to consider the following perspective:

Syntax is a view.

One of the enduring lessons we’ve learned from software design is the model-view separation: that is, there are often many useful ways to view the same thing, some of which we haven’t even dreamt up yet, so it’s wise to separate an object from how we view it (and coordinate the different views). This principle seems to virtually never be applied to programs, but it should! We should view the abstract syntax as the model—the “one true program”—and every syntax as a view on it.

So what you should really do is define an abstract syntax, and layer a light bicameral concrete syntax on top of it. Then build all the other parts. And then, think about what views you want to provide. Some people may be happy with the bicameral syntax. Others may want a more traditional infix syntax. Some may want braces, some may want significant whitespace. Some users may even want to use blocks—the ultimate bicameral syntax. You can (and should) view syntaxes as the fun desserts that you get to design and deploy once the hard work is done.This is not to say syntax is not hard; it’s actually harder than most people think, because it requires a delicate interplay between mathematics and human factors—including accessibility for people with visual, motor, and other impairments. Both F*dging up a Racket and, in much more depth, Beautiful Racket illustrate this principle practically and beautifully.

This perspective also gives you a useful artifact: a bicameral syntax that is a very nice target for programs that need to generate programs in your language. This is no longer a new idea, so you don’t have to feel radical: formats like SMT-LIB and WebAssembly text format are s-expressions for a reason. Our Forge tool supports both a syntax based on Alloy and one in s-expressions, the former for (most) people to use, the latter a target to tools that use Forge as a back-end.

In short, I hope this inspires a new vision of syntax design. Bicamerality is the best intermediate language we’ve come up with; it’s driven by both theoretical elegance and practical experience. For some, the bicameral syntax is also a lovely source language, but it should be just one of many views onto the core abstract syntax. Tools will become more reusable and, at the same time, the syntax wars can end. We can find peace, joy, and happiness. It is not a coincidence that a pair of parentheses looks like an embrace.