Bicameral, Not Homoiconic
If you spend enough time reading internet discussions of programming languages, you’ll learn that Lispy languages have a special property: they are homoiconic. This property is vested with mystical powers that both enrich Lisps and debase its competitors.
I have programmed in, and built, Lisps since the late 1980s. My blog
is called “parenthetically speaking”. And yet I’m here to tell you
that this term is mostly nonsense. However, there is something
special—
1 (Weak) Homoiconicity
What, supposedly, is homoiconicity? You will hear things like: the property that “a property of some programming languages that allows programs to be represented as data within the language”, or with “represented” substituted by “manipulated”, or more simply as “code as data”.
hello = 1 |
'hello = 1' |
'hello' + ' = ' + '1' |
'hello = 1'.split(' ') |
Does that make Python homoiconic?
Of course, there’s nothing special about Python here. We can use JavaScript to represent and manipulate JavaScript programs, C to do the same to C programs, and so on. Essentially, any programming language with a string datatype seems to be homoiconic. Heck, we didn’t even need strings: we could just as well have represented the programs as numbers (e.g., using Gödel numbering).
One of the traits of a good definition is that it be non-trivial: it must capture some things but it must also exclude some things. It’s not clear that this notion of homoiconicity excludes much of anything.
2 (Strong) Homoiconicity
But there’s a reasonable objection to what we wrote above. All that
we’ve done is written, combined, and taken apart strings. But
strings are not necessarily programs; strings are just strings,
a form of data. Data are data, but programs—
How do we turn data into programs? We do need some language support for that. We need something that will take some agreed-upon data representation of a program and treat it like a program, i.e., do whatever the program would have done. Typically, this is a function called eval: it evaluates the datum, performing the effects described in the datum, just as if it were a program. (Note that eval really treats “data as code”, not “code as data”.)
Many languages, including Python and JavaScript, have an eval. If they’re all homoiconic, then clearly this isn’t a particularly Lispy trait.
eval interacts poorly with its lexical environment, thereby making it hard to even program with effectively. We showed that JavaScript’s eval is not one but four operations and there are eight contexts that determine which of the four to use. This kind of complexity is overwhelming.
The complexity might be worth it if eval were a good idea, but it’s often a bad idea in programs! It makes code statically invisible, making every other aspect of program management—
static analysis, compilation, security checking, and more— much, much harder (or, for some important and useful kinds of analysis, impossible).
But this just shows why we shouldn’t be talking about homoiconicity at all. Let’s talk about what’s actually interesting instead.
3 The Parsing Pipeline
Let’s talk briefly about the classical parsing pipeline. For decades, we’ve been taught to think of parsing a program as having two phases: tokenization (sometimes colloquially called “lexing”) followed by parsing (not colloquially called “yaccing”), and more. What do they do?
A program is a list, or stream, of characters. A language implementation needs to first understand what a program is saying before it can make the program do what it says. That is, it has to understand the meaning of a program. This process of determining meaning is usually done in several stages, which make sense once we understand them in terms of their expressiveness.
A typical dataflow in the front-end of a language implementation looks like this:
What does this mean?
Just as you don’t read this article at the level of individual letters, given a program like hello = 1, we don’t want to deal with it at the level of each letter: h, e, l, … either. Instead, we’d like to start by thinking of it as having three tokens: hello, =, and 1. Note that in the process, we can ignore whether or not the = was surrounded by spaces (which Python does not require, but some languages do).We’ll return to the elided spaces later! This is the job of the scanner.
So the input list or stream of characters gets transformed, by the
scanner, into an output list or stream of tokens. But as a reader, you
still don’t really read the program at the level of tokens; instead,
you assign meaning. That is, you presumably read the above as
“assign hello to 1”.The actual meaning of = in
Python is much more messy. This is the job of the parser: to
create a data structure representing the high-level semantics of the
language. In this case, it might create an abstract syntax tree of a
“variable assignment”-type node with two children, one the variable being
assigned (here, hello) and one the value expression (here, just
1). If the actual expression were more complicated—
The exact mechanisms by which this form of parsing happens is somewhat
outside scope here. For many languages, grammars are ambiguous and
top-down parsing strategies—
The scanner usually sits at the lowest level of the Chomsky hierarchy: it is regular. It is therefore quick and efficient and its rules are easy to reason about in depth.
The parser, in contrast, is usually context-free or worse, making it a much more complicated beast.
In what follows, therefore, I will take the scanner as a given. Thus, the traditional pipeline then has just one stage: the parser. I therefore call this the unicameral pipeline.
4 The Bicameral Analogy
Many countries have what is called a bicameral legislature: “bi” meaning two, “camera” meaning room. That is, there are two houses of parliament. India, for instance, has the Lok Sabha and Rajya Sabha; England has the House of Commons and House of Lords; the USA has the House of Representatives and Senate. In all cases, there is a “lower” house and “upper” house.
My goal here is not to endorse various class-related associations or to delve into the actual practices, but rather to employ this as a useful, rough analogy. A theory of bicamerality is this. The lower house is a bit more rowdy and boisterous. It filters out some of the more absurd legislations, but it allows through many that are perhaps not entirely wise. The upper house is more calm and deliberative; in legislatures, its members are either appointed or serve longer terms, so they do not need to bend to every populist will. Therefore, the upper house usefully serves as a filter.
So: we have two houses. The lower house filters out dreck, but lets through several bills, some of which are unwise. The upper house passes judgment on these bills, allowing only the ones it considers wise. Enough political science, back to computer science.
5 Bicameral Syntax
As I’ve alluded above, having just a parser gives you a unicameral syntax. What Lispy languages give you is a bicameral syntax. But they are not unique in this; other notations, such as XML and JSON, are also bicameral. To make my point, I will use these other notations.
First of all, all these notations have to get through a scanner. In XML, you can’t write <title without a closing >; that’s just not even a valid opening tag. In JSON, you can’t write "hi without a closing "; that’s just not a valid string. We can think of these as basic tokenization failures. So let’s assume our input passes the scanner and we get a list or stream of tokens.
<foo><bar>This is my bar</bar> |
<foo><bar>This is my bar</foo></bar> |
Observe that we can make these judgments without knowing what the datum represents. There’s a reason I used names like foo and bar instead of something meaningful. That’s because it doesn’t matter: these XML and JSON data are wrong at a basic level. We say that they fail to be well-formed. This is our lower house.
However, there may be more complicated rules at play. It may be that
a bar really should not reside within a foo; it may be that
every baz requires one or more quuxes. If you’ve ever worked
with an XML or JSON format, you know that there are all kinds of rules
like this (that may or may not have been documented well!) that you
need to follow. For instance, when you read the documentation of a Web
API, it might tell you that a certain kind of request needs to provide
certain fields, but must also not provide certain other
keys. All these are called rules of validity. They build on top
of—
This leads to a new pipeline, one where some tool performs an
intermediate check for well-formedness, and another for validity. The
last stage—
Now, as written, it’s not clear what the output of the reader is. It could be that the reader just checks the tokens, and then passes on the same token list or stream to the parser, just as in the previous pipeline. But this would be a loss! The reader has the ability to construct an extremely useful intermediate data structure. The main property the reader looks for is that “things match up and nest properly”. In other words, the reader is confirming that the input is a tree. Thus, instead of passing on a dumb, flat list of tokens to the parser, it constructs trees.
This greatly simplifies the job of the parser for two reasons. At a more basic level, some of the checking that the parser previously had to do is already done. More interestingly, tokens are very low-level; trees are high-level. It’s much easier to write a recursive function that checks for validity and produces abstract syntax trees when already given well-formed trees as an input!For this reason, PLAI calls this the “Primus Inter Parsers”, a joke that seems lost on most.
The scanner is still regular, with all the attendant benefits and weaknesses.
The reader is context-free. It does not need to do more than that.
The parser is now freed of basic context-free checks, and can focus on other context-free and, in particular, context-sensitive checks.
This, then, is a bicameral syntax. The reader is the “lower house”; it rejects basic mistakes, but still lets through some invalid things. But it also presents them wrapped up in a cleaner, simpler structure to work with. The parser is the “upper house”; it receives this clean structure and can look for deeper flaws in it, only allowing those things that pass its more exacting scrutiny.
We get a greater separation of concerns, making it easier to both understand and to swap out components.
We get to more gradually walk up the complexity hierarchy.
The reader sits at a key intermediate level, turning tokens into trees, thereby greatly simplifying the parser’s task while itself being quite easy to write.
This simplicity of (correct!) implementation matters! It doesn’t matter as much when you have only one implementation of a language. However, a lot of our traffic in computing is best represented as trees: they are much more powerful than strings, but also simple enough to write and read. It is therefore no surprise that bicameral syntaxes like JSON have become ubiquitous. Every language now needs a reader for them, and these readers have to implement the same language, so simplicity of specification and of implementation are a virtue. In turn, many systems need to separately validate a JSON input; again, the simplicity of writing this kind of (tree-consuming) parser is a big win.
This additional level also has other tooling advantages. It’s not only language implementations that need to support syntax; so do editors. It’s a lot easier to support matching, indentation, coloring, and so on at the well-formedness level. And it’s easy to indent trees! It’s also easy to traverse them: there are meaningful generic tree-level operations (“go to the opening”, “go to the closing”, “go up a level”, “go down a level”, etc.) that can be applied to all languages with the same well-formedness characteristic (e.g., JSON). And again, we have lots of editors, and they each need to support the same syntaxes; bicameral languages, by providing the intermediate notion of well-formedness, let tools hit the trifecta of: correct, useful, and relatively easy.
These advantages are offset by one drawback: some people just don’t like them. It feels constraining to some to always write programs in terms of trees, rather than more free-form syntax. Even in this bicameral space, there are trade-offs: XML requires you to repeat tags at closing, which improves error-checking and catches errors in refactoring, but makes authoring painful; Lisps don’t require tag repetition but, traditionally, use only one kind of parenthesis, making structure harder to see; JSON mixes delimiters while avoiding closing tags, which seems to be some kind of sweet-spot (because it still provides some checking against editing errors). Still, what people are willing to embrace for writing data seems to irk them when writing programs, leading to the long-standing hatred for Lispy syntaxes.
6 Back to Lisps
We started with Lisp, so let’s go back there. What is Lisp? Lisp is a feeling, an emotion, a sentiment; Lisp is a vibe; Lisp is the dew on morning grass, it’s the scent of pine wafting on a breeze, it’s the sound of a cricket ball on a bat, it’s the…oh, wait, where was I. Sorry.
Okay, seriously. Note that I keep writing “Lispy languages”: both the “y” and “[language]s” should have stood out by now. What do I mean by this?
I’m not referring to a particular language: Common Lisp, say. I’m referring to a family of languages, which includes Common Lisp but also Scheme and Racket and Clojure and many others. This family is bound by some cultural antecedents, but the individual languages can be wildly different. What they share is a philosophy of syntax. And now that we’ve fleshed out that philosophy, if we are being generous, we could allow that any bicameral language is at its essence “Lispy”.
The power of this philosophy is that the language platform can implement its bicameral syntax once, and well, and lots of languages can inherit it. We noted above that there are many (data) languages built atop JSON, each of which has its own validity rules. We can likewise build many data and programming languages atop any bicameral syntax; they will share some familial traits but differ quite a bit in the details.
It does not dictate a particular implementation strategy. For instance, the argument in Lisp sometimes goes, “code is data because code is represented using s-expressions, which are also Lisp data”. A full tutorial of what these terms means is beyond what I want to get into here. What I will just say is that there are Lispy languages where this is not true at all.
The one I know best is Racket, because I originally built some of these parts of it. In Racket, code is not represented using s-expressions. This is because “code” requires many more things: it needs, for instance, to record the source locations so that it can use it to report errors;Remember those whitespaces we said we could ignore? We can’t if we want to report errors! it needs hygiene information to avoid inadvertent capture; and more. Therefore, Racket has a notion of syntax object (which, yes, is a datum) that captures this much richer notion of “program source”. Syntax is a separate type that happens to parallel (and enrich) s-expressions and can be interconverted with them, but it is not an s-expression. So, while there is a large family of functions that operate on (say) lists, and s-expressions include lists, none of those functions will apply to syntax.
People will sometimes say that the read primitive “parses”. It does not: it reads. It “parses” inasmuch as it confirms that the input is well-formed, but it is not the parser of convention—
one that determines validity according to context-free and context-sensitive rules, and identifies “parts of speech”— so it is false to say that Lisps ship with a parser. To make this concrete, here is a well-formed Lispy term that read has no problem with: (lambda 1). That is, however, a syntax error in most Lisps: a determination made by the parser, not the reader. Of course, nothing prevents us from creating a new language where that term has some meaning. That language’s parser would be responsible for making sense out of the pieces.
People sometimes believe you can only have macros if you have a bicameral syntax. This isn’t true: all you need is a way to represent code, because macros transform code to code. But as more languages have realized the benefits of having macros, some have added procedural macros (an idea that dates back decades in Lisp lore). Bicameral syntaxes just provide a gateway to gloriously rich macro systems, because we’ve learned that the well-formedness layer is an especially good layer atop which to build macros.
7 What About Other Languages?
I want to end with some forward-looking comments.
One of my frequent sources of amusement—
I would like to suggest another path forward: start with a bicameral syntax. Quickly get to the point where you can think about semantics. What does your language do? How does one reason about programs? How does one run them? Do all the hard and interesting and important parts first.
But, you argue, “Now I have a bicameral syntax! Nobody will want to program in it!” And that may be true. But I want you to consider the following perspective:
Syntax is a view.
One of the enduring lessons we’ve learned from software design is the
model-view separation: that is, there are often many useful ways to
view the same thing, some of which we haven’t even dreamt up yet, so
it’s wise to separate an object from how we view it (and coordinate
the different views). This principle seems to virtually never be
applied to programs, but it should! We should view the abstract
syntax as the model—
So what you should really do is define an abstract syntax, and layer a
light bicameral concrete syntax on top of it. Then build all the other
parts. And then, think about what views you want to provide. Some
people may be happy with the bicameral syntax. Others may want a more
traditional infix syntax. Some may want braces, some may want
significant whitespace. Some users may even want to use blocks—
This perspective also gives you a useful artifact: a bicameral syntax that is a very nice target for programs that need to generate programs in your language. This is no longer a new idea, so you don’t have to feel radical: formats like SMT-LIB and WebAssembly text format are s-expressions for a reason. Our Forge tool supports both an syntax based on Alloy and one in s-expressions, the former for (most) people to use, the latter a target to tools that use Forge as a back-end.
In short, I hope this inspires a new vision of syntax design. Bicamerality is the best intermediate language we’ve come up with; it’s driven by both theoretical elegance and practical experience. For some, the bicameral syntax is also a lovely source language, but it should be just one of many views onto the core abstract syntax. Tools will become more reusable, and at the same time, the syntax wars can end. We can find peace, joy, and happiness. It is not a coincidence that a pair of parentheses looks like an embrace.