From the moment I started paying attention to the tech industry, Paul Graham was there. My first job out of college was in SoMa, around the corner from the Justin.tv offices, and his essays were just floating around in the ether, impossible to ignore. His popularization of Lisp was a small part of why I tried Clojure, and a big part of why Clojure was successful.
I recognized that he had a tendency towards self-aggrandizement and awkward flattery of his readers, but at worst he seemed harmless. As his writing became increasingly focused on startups, and I became increasingly sure I didn’t want to be a founder, he simply drifted out of view.
Recently, however, his writing has taken a reactionary turn which is hard to ignore. He’s written about the need to defend “moderates” from bullies on the “extreme left”, asserted that “the truth is to the right of the median” because “the left is culturally dominant,” and justified Coinbase’s policy to ban discussion of anything deemed “political” by saying that it “will push away some talent, yes, but not very talented talent.”
I went back to the essays I had read a decade before, to see if I had missed something. It turned out that I had. There was a consistent intellectual framework underpinning all his writing, from his very first essays on Lisp and language design. In many ways, those early essays contained the clearest articulation of his framework; it just took me ten years to see it.
In April 2001, six years after the release of the Java language, Paul Graham weighed in:
He followed up with a number of observations about Java, such as the fact that it was designed by committee, infantilizes its users, and owed its initial success to an ailing corporate sponsor. But, he wrote, this wasn’t an analysis of Java so much as introspection on his own “hacker’s radar”. This radar was the aesthetic response of an expert programmer, drawn from epiphenomena surrounding the language and the opinions of other experts in his social circle.
A month later, in an essay on why languages are popular, he doubled down on the importance of his personal intuition:
Programming languages are for hackers, and a programming language is good as a programming language (rather than, say, an exercise in denotational semantics or compiler design) if and only if hackers like it.
The quality of a language can only be judged by experts (“a tiny minority, admittedly, but that tiny minority write all the good software”), and adoption by those experts will drive adoption by everyone else. Ultimately, “a programming language probably becomes about as popular as it deserves to be.”
The context for both essays was that Graham was creating his own language, Arc. He wanted it to be popular, and was working out how to make that happen.
The first intrinsic driver of popularity he named was “brevity”:
He called back to this when discussing the importance of libraries, which can reduce any program to a single invocation:
Of course the ultimate in brevity is to have the program already written for you, and merely to call it. And this brings us to what I think will be an increasingly important feature of programming languages: library functions. Perl wins because it has large libraries for manipulating strings.
It appears that Graham was referring to Perl’s core library functions, not the much larger set of library functions that were even then available via CPAN, because he placed this responsibility wholly upon the shoulders of the language designer, saying language design would become increasingly focused on “how to design great libraries.”
Graham named other drivers of popularity in this essay, but he returned to brevity again and again over the next year, culminating in an essay on the singular importance of brevity, now dubbed “succinctness”:
Drawing from studies that found “programmers seemed to generate about the same amount of code per day regardless of the language”, he declared that “the only way to get software written faster was to use a more succinct language”.
Nowhere, however, did he mention libraries. The next month, he explained that given a sufficiently succinct language, users could simply write their own libraries:
As for libraries, their importance also depends on the application. For less demanding problems, the availability of libraries can outweigh the intrinsic power of the language. Where is the breakeven point? Hard to say exactly, but wherever it is, it is short of anything you’d be likely to call an application. If a company considers itself to be in the software business, and they’re writing an application that will be one of their products, then it will probably involve several hackers and take at least six months to write. In a project of that size, powerful languages probably start to outweigh the convenience of pre-existing libraries.
This was a significant departure from his earlier writings. Only a year before, he had stated that “[i]t’s hard to design good libraries. It’s not simply a matter of writing a lot of code.” He had emphasized that library design was a key part of language design, and even a year later he would tell us “[d]esign usually has to be under the control of a single person to be any good.”
Now he argued that the language designer only need provide a barebones language of sufficient brevity, and all else would follow. Library design can’t be both critically important and an incidental part of someone else’s six-month software project. Despite this, Graham never mentioned libraries ever again.
A year later, he explained that Arc was trying to be a “hundred-year language”. “It may seem presumptuous,” he wrote, but “[l]anguages evolve slowly because they’re not really technologies. Languages are notation. A program is a formal description of the problem you want a computer to solve for you.” He asserted the most important part of the language were the “fundamental operators”, because the rest of the language “could in principle be written in terms of these fundamental operators”.
What, then, makes a language ready for the 22nd century? Certainly not any concerns about performance, since “[e]ven if [computers] only end up being a paltry million times faster, that should change the ground rules for programming languages substantially.” Not data structures, since they’re just a premature optimization of the humble list. Not a mechanism (or even notation) for parallel computation, since a simple description of the problem will “ignore any advantages to be got from parallel computation, just as they will ignore advantages to be got from specific representations of data.”
A hundred-year language should, however, be succinct. First “write down the program you’d like to be able to write, regardless of whether there is a compiler that can translate it or hardware that can run it.” And of course the program you’d really like to write is the shortest one possible:
“If we had the hundred-year language now,” Graham wrote, “it would at least make a great pseudocode.” Confusingly, he asserted that since it will need to perform well on its million-times-faster future processor, “presumably it could generate code efficient enough to run acceptably well on our hardware.”
“When you see these ideas laid out like that,” he wrote, “it’s hard not to think, why not try writing the hundred-year language now?”
Four years later, in 2008, Arc was released. It was a Lisp-1 with shorter names and fewer parentheses than most other Lisps, and some reader macros to make anonymous functions easier to define. All primitives were defined in terms of MzScheme, a different Lisp, which provided the compiler and other tooling. It also came with a barebones web framework, built atop a continuation-based MzScheme framework that had been around since 2001.
It was, in all, underwhelming. There were many paths that could have led Graham to his professed goals, and he took none of them.
He had written that strings were premature optimization, and should be replaced by lists of characters. If he had done so, and made the characters full Unicode code points, Arc could have been one of the few languages not suffering from a half-century hangover stretching all the way back to EBCDIC. Instead, the initial release used byte strings which only supported the ASCII character set.
Graham had asked “[h]ow many times have you heard hackers speak fondly of how in, say, APL, they could do amazing things with just a couple lines of code? I think anything that really smart people really love is worth paying attention to.” But the undeniably succinct primitives and composition rules of array programming languages were nowhere to be found.
Server-based deployment of software was a central theme in Graham’s essays, and Arc’s continuation-based web framework was an interesting and fairly novel way to create continuity across multiple requests in a single session. But since each link on the page was a continuation, and each continuation was stored in-memory in a single process, this created a single, memory-hungry point of failure. For years, Hacker News would simply display “unknown or expired link” if you waited too long to click a link. If Arc had its own runtime, it could have supported durable closures, or any number of other things that made this approach robust and scalable. Instead, Arc remains a thin gloss atop another Lisp’s runtime, and Hacker News (now running on a machine with enough memory to stay afloat) remains the only meaningful deployment of Arc in existence.
In rejecting parallel computation as “premature optimization”, Graham also seemed to have eschewed any consideration of concurrency primitives, which are necessarily “fundamental operators” and critically important to writing clear, concise network-facing software.
My favorite near-miss is this observation Graham made in December 2001, only months after he began development on Arc:
He had noticed that immutable maps are a useful data structure. Building on this, he might have found a paper published the previous year describing an efficient implementation for immutable maps, and made those a foundational data representation in his language. That, in any case, is what Rich Hickey did when creating Clojure, which was released three months before Arc and became the most widely-used Lisp ever made. Instead, assoc-lists remain a list-backed data structure which, Arc’s documentation informs us, makes them “inefficient for large numbers of entries”.
Arc’s release was greeted by widespread disappointment. This reception, Graham wrote, caused him to realize that his design process almost guaranteed a “contemptuous initial reaction”:
What his critics couldn’t see was that he always began “with a very crude version 1” and then “iterat[ed] rapidly”. He had done it with his startup, with his new investment fund, and even with his essays. In every case, he had proven his critics wrong.
“[L]aunch as soon as you can,” he wrote, “so you start learning from users what you should have been making.” It’s unclear what Graham learned, but he stopped working on Arc the following year.
It’s remarkable how much time Graham’s essays on language design spend narrowing the scope of his attention, and how little they spend delving into what remains. For all the words justifying brevity as his core focus, his only analysis of the nature of brevity itself is that it corresponds to the size of the parse tree rather than the number of characters. The fact that any program can be reduced down to
(execute-my-program) isn’t grappled with, presumably because libraries were also deemed out of scope.
A more serious analysis of brevity might define it as “the entropy of a parse tree recursively inlined/macroexpanded down to language primitives.” This suggests our focus ought to be on two questions: how compressible are our primitives, and how can we enable users to achieve something close to optimal compression? Since the first question has a relative measure (what’s the compression ratio for our expanded parse tree?), we could productively iterate on both better primitives and better tools for abstracting over them.
But Graham’s analysis of brevity, and indeed of all language design, was fundamentally unserious. He wasn’t interested in a rigorous definition of brevity, because the ultimate measure of a language’s quality was still his hacker’s radar. All of his essays, and Arc itself, were just spokes around that central hub. If his essays sometimes disagreed, or if Arc didn’t reflect his essays, it’s hardly surprising; their only connection was they all, in the moment, seemed right and true to Paul Graham.
Graham was, undeniably, an expert programmer. He was also a skilled technical writer and among the first to realize that using a website to build websites was a good idea. By the time he began work on Arc, his technical intuition had already taken him far in life. This time, however, it didn’t take him nearly as far as he expected.
Michael Polanyi coined the term “tacit knowledge” to describe something we only understand as part of something else. When we speak, for instance, we don’t focus on the sounds we’re making, we focus on our words. We understand how to speak, but would struggle to explain it. Tacit knowledge comprises the vast majority of what we know; we rely on it constantly.
When that knowledge begins to lead us astray, however, Polanyi tells us we that must delve into it. We must make it explicit. An explicit understanding of speech might be necessary for someone with a speech impediment, but also for a professional performer; to be at the top of your field, it’s almost always necessary to transform innate talent into something more.
A rare exception to this rule is the chicken sexer, who can quickly and accurately determine whether a day-old chick is male or female. The two are indistinguishable to most, but an expert can classify a thousand chicks an hour with 98% accuracy. The knowledge underpinning this expertise has never been made explicit; a trainee is simply corrected by an expert, over and over, until their intuition is equally refined.
Held to this standard, however, we all fall short. No one’s sensibilities about software design are so refined that they can teach simply through demonstration. We have to distill our intuition down to principles, and let those principles guide us beyond the bounds of our intuition. Anything less is just rentier pedagogy; maxims stripped of any context, whose true meaning within a given situation can only be judged by a single person.
This is the essence of modern “thought leadership”, and it’s served Graham well. His essays on language design, as well as a few on startups, brought in the first entrepreneurs to his fledgling VC fund. People applied to YCombinator because they wanted Graham to apply his intuition to their problems.
Graham’s essays on startups were much the same as his essays on language design, but they served a different purpose. They were marketing content, and in that role they excelled. They tantalized the reader by reducing complex problems down to singular, nebulous concepts. They hinted at deep insights that were just out of reach. They made people want to be in the same room as Paul Graham.
Since leaving the fund, however, Graham’s reach has once again begun to exceed his grasp. He applied his intuition to social and economic trends in the latter half of the 20th century, and deemed the modern startup to be their apotheosis. Confronting the assertion that “billionaires exploit workers”, he simply observed that “a startup must sing for its supper, by making things that genuinely delight its customers,” as if one had anything to do with the other.
A recurring theme in his essays, both before and after his time at YCombinator, is conformity. Graham is resolutely on the side of the non-conformist, repeatedly tracing their lineage back to Galileo’s insistence that the earth moves, even after the church had forced him to recant.
In the most recent entry in this series, Graham introduces a quadrant model with axes of aggressive/passive and independent/conventional. He portrays the (small, vulnerable) group of aggressively independent people as the protagonists, and the (large, rabid) group of aggressively conventional people as their antagonists.
Graham writes “the call of the aggressively independent-minded is ‘Eppur si muove’,” but he doesn’t pause to consider that it is also “EARTH HAS 4 CORNER SIMULTANEOUS 4-DAY TIME CUBE” and, more worryingly, “Jews will not replace us”. His model exists largely so he can focus on the one quadrant he finds interesting, but even that is a proxy for a much smaller group, the moral and intellectual heirs of Galileo, the people he intuits to be his peers. Graham doesn’t work through the consequences of his own model because the model doesn’t matter; what matters is sharing some things that feel right and true.
This is all to say that Paul Graham is an effective marketer and practitioner, but a profoundly unserious public intellectual. His attempts to grapple with the major issues of the present, especially as they intersect with his personal legacy, are so mired in intuition and incuriosity that they’re at best a distraction, and worst a real obstacle to understanding our paths forward.
Unfortunately, this seems unlikely to ever change. In 2019, he announced he was working on a new language, Bel. When asked about its goals, he replied:
All other things (e.g. libraries) being equal, language A is better than language B if programs are shorter in A. (As measured by the size of the parse tree, obviously, not lines or characters.) The goal of Bel is to be a good language. This can be measured in the length of programs written in it.
Of course, “all other things” will never be equal. We can measure the length, but what can we compare it against? Only his radar knows.