dev@whisperer:~$

PL

Conversation with a Machine. Part 3

This will be the final post in the language series. For now, I can’t get any closer to the core of the problem that has both bothered and fascinated me for years. Many of its aspects remain mysterious to me — just like the enormous mystery of where language itself came from and how remarkable the phenomenon of communication really is. And not only communication between humans.

Programming as an art form

I don’t want to get overly sentimental and start calling programming an art. It’s enough that Donald Knuth already did that. After all, people call both Michelangelo’s David and a literal can of shit “art”, so…

That said, code is undeniably a form of creative work built out of written words. This idea is echoed in the Code as Speech movement — whose distant ripple effect is that some of us can even deduct software development costs from our taxes.

For context: “Code as speech” originated in American legal debates. The core idea is simple. Code is a form of individual expression protected under freedom of speech. In the U.S., that means protection under the First Amendment.

Framing the issue this way has major consequences not only for software developers. It’s worth remembering that as recently as the 1990s, strong cryptography was legally treated as a form of weaponry (“munitions”), and exporting it from the United States was heavily restricted by the government.

Programmer Daniel Bernstein, widely considered the father of the “Code as Speech” movement, challenged this in court while fighting for the right to publish the source code of his encryption system, Snuffle. His argument was that the program constituted scientific speech. The court agreed, and Bernstein became something of a folk hero in the fight against software censorship.

Free Speech Flag

Free Speech Flag. Each stripe color corresponds to three hexadecimal RGB numbers. The full hex sequence encodes a cryptographic key that allowed users to copy HD DVD and Blu-ray discs.

If we treat code as a form of speech, then in a free country — such as the United States and many other democratic states that follow its legal philosophy — no individual or institution should be able to forbid its publication.

Sounds great, but there’s a catch.

It also means programmers are free to publish their “creations” regardless of whether they have noble purposes.

Treating code as a form of expression comes with all the strengths and weaknesses of free speech. On one hand, we gain the limitless ability to create and share open-source software. On the other hand, we also get an equally limitless space for publishing exploits, deepfake models, CAD files for 3D-printed guns, and plenty of other questionable artifacts.

Boilerplate and verbal filler

Within the Code as Speech movement, comparing code directly to spoken language isn’t entirely accurate. Code is fundamentally material. In that sense it’s closer to written work — almost a kind of literature.

Just… a somewhat limited one.

In the raw building blocks of code — programming languages themselves — you won’t find the subtleties that define human communication: irony, sarcasm, a wink to the reader. Even when these appear in comments or function names, they are completely invisible to the machine.

Human speech, by contrast, is full of such ornamentation.

Programming languages are also much more economical with words — especially when the programmer follows the principle of DRY. Still, many languages contain a surprising amount of verbal padding known as boilerplate code.

The term comes from the printing industry, where “boilerplate” referred to reusable printing plates used for repetitive content such as advertisements. These plates were made from rolled steel — the same material used to manufacture boilers.

Boilerplate code, while not always strictly necessary, is often unavoidable in strongly typed and object-oriented languages such as Java or C#. Its structure is rigid and repetitive, showing virtually no variation.

Human conversations look very different — especially among people who know each other well. Discussions often resemble a drunk aunt at a wedding: the same stories are told again and again, but each time slightly differently, sometimes morphing beyond recognition.

Our everyday conversations contain plenty of repetition, jokes, lip smacks, burps, digressions, and anecdotes. In other words: a lot of things that may not carry essential information but help people socialize, bond, argue, and simply spend time together.

Machines, naturally, have no concept of this. And don’t be fooled by chatty AI agents.

Natural languages also contain phenomena such as:

  • neologisms
  • synonyms
  • metaphors
  • rhyme and rhythm
  • onomatopoeia
  • profanity

Despite coming from very different linguistic categories, they share one important trait: none of them truly exist in programming languages.

Neologisms might appear in variable or function names. But onomatopoeia? Rhymes? Why would a machine need them? Profanity? Again — only in comments (although allegedly code containing swear words is statistically better).

Examples could go on.

I’m bringing these phenomena up partly to poke at the Code as Speech movement. My point is that code lacks many of the elements that transform a random stream of words into a fully fledged act of speech — a genuine articulation of thought.

And although I personally think programming resembles speech about as much as a goat resembles a guitar, I still strongly support the movement’s principles.

Between censorship and freedom, I’d rather stand on the side of freedom.

Native speakers

Programming languages — much like natural languages — can be categorized into different types.

In natural languages, classifications are usually based on origin. Polish, for example, descends from the hypothetical Proto-Indo-European language, which later evolved into multiple language families across the globe. Today, communication between speakers of these families often requires prior language learning.

For example, a German speaker and a citizen of the United States speak mutually unintelligible languages, even though both ultimately trace their origins back to the same proto-language.

Programming languages, however, are categorized differently, and genealogy is rarely the most important factor.

We typically divide them according to:

  • Level of abstraction

low-level languages (C, Assembly) versus high-level languages (Python, Java, JavaScript)

  • Paradigm

Object-Oriented Programming (OOP) versus Functional programming and sometimes Procedural programming

  • Execution model

compiled versus interpreted versus hybrid

  • Typing system

static versus dynamic

When describing a programming language, we usually define it according to these criteria.

How does this compare with natural languages?

In much the same way that a native Spanish speaker would likely understand nothing from a conversation in Japanese, code written in Rust cannot be executed by a Python interpreter.

The reason is simple: a Python interpreter understands only Python syntax. It cannot process code written in another language without some form of translation layer.

On the other hand, any program written in one programming language can theoretically be translated into another language without losing meaning — as long as both languages are Turing complete and the code hasn’t been compiled yet.

In theory, you could translate it into a sequence of x86 mov instructions — or even into the card game Magic: The Gathering, which according to various experiments is also Turing complete.

Of course, the result would hardly be efficient.

Compiled code is a different story. Compilation is a lossy process. Once code has been compiled, it cannot be automatically reconstructed into its original high-level source form. This is precisely what makes reverse engineering so difficult. It’s a bit like trying to extract the original ingredients from a cake after it’s already been baked.

How does translation work between natural languages?

That’s an enormous topic. Without diving into theoretical nuances, we can say that any natural language can be translated into another — at least to some extent. The result may not be perfect, but it is usually good enough for people from different cultures and linguistic backgrounds to communicate.

Despite what extreme linguistic relativists might claim, I believe that basic understanding is possible even between speakers of languages that are culturally and genetically very distant.

Words, words, words

LORD POLONIUS

(…)

What do you read, my lord?

HAMLET

Words, words, words.

In the end, all languages consist of words — units that carry meaning.

A word doesn’t necessarily have to be written. Sign languages are a good example. What matters is that a “word”, understood as a unit of meaning, can be defined.

Many words look and sound the same but have different meanings (homonyms). Sometimes their meaning depends on context (remember deixis from the previous post?).

Interestingly, a word is also a unit of meaning for a processor. The size of a machine word defines how many bits of information a processor architecture can handle at once.

Typically this is 32 or 64 bits, although historically machine words could have arbitrary lengths. The same was once true for the byte, which only became standardized as eight bits in the 1960s with the introduction of IBM System/360, a computer architecture that established a market standard still in use today.

In spoken language we rely on tone, voice timbre, and pauses. They help us distinguish irony from questions or anger. In writing, this role is largely played by punctuation.

Usually punctuation simply helps readers understand written text. People can often interpret sentences even without it — although sometimes punctuation completely determines meaning.

Truck drivers working for a dairy company learned this the hard way when a missing comma in labor law forced their employer to pay them 5 million dollars in overtime.

In programming, symbols equivalent to punctuation are integral elements of the language.

For example:

  • commas usually separate function arguments or elements in arrays
  • indentation in Python defines blocks of code
  • semicolons in many popular languages mark the end of a statement

These symbols are essential for compilers, which rely on context-free grammars to translate high-level languages into instructions understandable by processors.

Compilers are extremely allergic to ambiguity. They happily throw errors the moment a comma appears in the wrong place or a bracket isn’t properly closed.

Laurie Kirk made a fantastic video about this topic, which I highly recommend:

This obsession with unambiguous meaning excludes many linguistic subtleties — such as irony, sarcasm, or metaphor — topics I’ve already complained about in this post.

Or does it?

In the end, I can’t resist pointing out that metaphor may actually lie at the foundation of many programming constructs.

What else is a void function if not a metaphor for absence?

What about the ominous die in PHP?
Or panic in Go?
Or the sealed modifier in C#?
Or yield in Python?

I have a feeling these questions bring us right back to where this entire series began — road signs.

After all, they are words too.

Words, words, words.

Yield Traffic Sign

Photo by Matt Bango, CC license, [source](https://freerangestock.com/photos/167842/yield-traffic-sign-with-blurred-background.html)