```
{-# LANGUAGE FlexibleContexts #-}
import Control.Monad.State
```

# The ultimate compiler

We previously defined a programming language with the help of Turing machines:

interpret :: Language -> TM

We chose poorly.

In the classes I took, any exercises asking for a Turing machine invariably involved pitifully trivial problems, such as copying some number of 1s.

Later, they introduced Universal Turing Machines, but without actually showing us one because it was too much trouble. With universal machines, they explained why it’s impossible to decide if a given Turing machine ever halts.

So from day one, they warned us Turing machines are clumsy beasts where easy tasks are hard to accomplish, and generally hang unpredictably.

## Lambda Calculus

Everyone knows lambda abstractions nowadays. C++ got them. JavaScript got them. Even everything-is-an-object Java got them.

Less well-known is that lambdas alone are equal in power to Turing machines. We can toss out states, tapes, read/write heads, and do nothing but repeatedly substitute a variable with an expression, yet still compute just as effectively. If LC denotes the set of all closed lambda terms, then the following mutually inverse functions are known to exist:

fromTM :: TM -> LC fromLC :: LC -> TM

We’re taking some artistic license with the word "inverse" as a round trip may take us to a different program. The point is it’ll behave identically to the original.

Lambdas are easy for humans to understand. Practicians love them so much that popular languages eventually support them, while theoreticians define programming languages with lambda calculus. In fact, our implementation of Turing Machines is written in syntactically sweetened lambda calculus so is close to a candidate for fromTM.

Lambdas are easy for computers to understand. Compiler textbooks describe transforming source code into "SSA form", another name for lambda calculus, so it can be readily analyzed and manipulated.

Thus one way or another, a source language is often defined in terms of lambda calculus. Accordingly, we revise our definition of a programming language to be a function taking a program to a closed lambda calculus term:

interpret :: Language -> LC

Naturally, composing with fromLC takes us back to Turing machines, but let’s not go there. 'Tis a silly place.

## What is lambda calculus?

Lambda calculus terms can be represented by trees of the form:

type VarId = String data LC = Var VarId | Lam VarId LC | LC :@ LC

A program is represented by a *closed* lambda term, which means every Var
node must be a descendant of a Lam node with a matching VarId.

Normally, we next talk about variable substitution (or *beta reduction*) to
describe the *dynamic semantics*, that is, how to compute with a lambda term.
We leave that for the textbooks (or see my notes on lambda
calculus). For our purposes, a closed lambda term is merely notation for a
*combinatory logic* term, which is a full binary tree whose leaves can be one
of 6 different values:

▶ Toggle extensions and imports

```
data Com = S | K | I | B | C | T
```

We actually only need S and K; the others can be thought of as macros.

```
data CL = Lf Com | CL :# CL | Ext String
```

The Ext String field comes into play later, when we want to mix external functions with our combinators.

The rewrite function below rewrites a closed LC term as a CL term, using an
algorithm known as *bracket abstraction*. See
Smullyan’s "To Mock a
Mockingbird" for a particularly enjoyable explanation of why this results in a
combinatory logic term whose meaning matches that of the original lambda term.

```
type VarId = String
data LC = Var VarId | Lam VarId LC | LC :@ LC | Other Com
deLam :: LC -> LC
deLam t = case t of
Lam v u -> case deLam u of
Var w | v == w -> Other I
| otherwise -> Other K :@ Var w
Other c -> Other K :@ Other c
x :@ y -> Other S :@ deLam (Lam v x) :@ deLam (Lam v y)
x :@ y -> deLam x :@ deLam y
_ -> t
rewrite :: LC -> CL
rewrite t = case deLam t of
Other c -> Lf c
x :@ y -> rewrite x :# rewrite y
```

We tweaked our earlier definition of LC so they are combinator-aware; during bracket abstraction, we produce intermediate Frankenstein terms that are an amalgam of CL and LC terms.

## What is combinatory logic?

To execute a combinatory logic term, we *reduce* subterms matching certain
patterns to other subterms:

```
reduce :: CL -> Maybe CL
reduce t = case t of
Lf I :# x -> Just x
Lf K :# x :# y -> Just x
Lf T :# x :# y -> Just (y :# x)
Lf S :# x :# y :# z -> Just (x :# z :# (y :# z))
Lf B :# x :# y :# z -> Just (x :# (y :# z))
Lf C :# x :# y :# z -> Just (x :# z :# (y ))
_ -> Nothing
```

The S combinator duplicates one of its arguments. Although we think of the
result as a tree, in our implementation, we wind up with two nodes pointing to
the same copy of the argument that is duplicated, that is, we employ *sharing*
to conserve memory. The S combinator also means a tree need not shrink after a
so-called reduction.

If none of the patterns appear, then no reductions are possible and the term is
said to be in *normal form*. Otherwise one or more subterms can be reduced, and
we must choose which to reduce. After a reduction, new reducible subterms may
appear, and again we must choose.

One strategy is to reduce in *normal order*: repeatedly reduce the leftmost
subtree that can be reduced.

If a term can be reduced to a normal form (which is in some sense unique by Church-Rosser), then normal-order reduction will find it. Other evaluation orders might never terminate even when a normal form exists.

The left *spine* of the tree is the path that starts from the root node and
recursively follows the left child. To evaluate in normal order, we walk down
the left spine until we bottom out, then reduce as we walk back up to the root;
on each reduction, we must walk back down again in case the replacement subtree
can be reduced. Afterwards, we again walk down the left spine to the bottom,
and this time as we walk back up, we recursively normalize the right branches.

```
normalize :: CL -> CL
normalize t = down t [] up1 where
down t args k = case t of
x :# y -> down x (y:args) k
_ -> k t args
up1 t args = case reduce t of
Just t' -> down t' args up1
Nothing -> case args of
[] -> down t [] up2
a:as -> up1 (t :# a) as
up2 t args = case args of
[] -> t
a:as -> up2 (t :# normalize a) as
```

For our computations, it turns out we only need our terms to reach *weak head
normal form*, which means we skip the second trip.

```
run :: CL -> CL
run t = down t [] where
down t args = case t of
x :# y -> down x (y:args)
_ -> up t args
up t args = case reduce t of
Just t' -> down t' args
Nothing -> case args of
[] -> t
a:as -> up (t :# a) as
```

## Input and output

Turing machines have a tape to handle input and output. With combinatory logic, the program is a term, the input string is encoded as a term, and the output string is the decoding of the program term applied to the input term.

We use
the *Scott encoding*. Strings are Scott-encoded lists of
Scott-encoded Peano numbers.

Encoding and decoding requires us to evaluate combinators alongside our own functions, hence the runM function and the state monad.

```
runM :: Monad m => (CL -> m (Maybe CL)) -> CL -> m CL
runM f t = down t [] where
down t args = case t of
x :# y -> down x (y:args)
_ -> up t args
up t args = do
m <- f t
case m of
Just t' -> down t' args
Nothing -> case args of
[] -> pure t
a:as -> up (t :# a) as
encodeChar :: Char -> CL
encodeChar c = go (fromEnum c) where
go 0 = Lf K
go n = Lf K :# (Lf T :# go (n - 1))
decodeChar :: CL -> Char
decodeChar t = toEnum $ execState (runM red $ t :# Ext "Z" :# Ext "S") 0 where
red t = case t of
Ext "S" :# a -> do
modify (1+)
pure $ Just $ a :# Ext "Z" :# Ext "S"
_ -> pure $ reduce t
encode :: [Char] -> CL
encode "" = Lf K
encode (c:cs) = Lf K :# (Lf C :# (Lf T :# encodeChar c) :# encode cs)
decode :: CL -> [Char]
decode t = execState (runM red $ t :# Ext "nil" :# Ext "cons") id "" where
red t = case t of
Ext "cons" :# x :# xs -> do
modify (. (decodeChar x:))
pure $ Just $ xs :# Ext "nil" :# Ext "cons"
_ -> pure $ reduce t
```

## One size fits all

We can now run a combinatory logic program:

```
runCom :: CL -> String -> String
runCom t s = decode (run (t :# encode s))
```

If we believe it is easy to rewrite runCom and reduce in the target language, then we can whip up the following function in no time:

gen :: CL -> TargetLanguage

Thus we have a compiler which works on lambda calculus:

ultimateCompiler :: LC -> TargetLanguage ultimateCompiler = gen . rewrite

By our definitions, this implies we can compile any given language by composing the above function with the definitional interpreter of the language:

compile :: Language -> TargetLanguage compile = ultimateCompiler . interpret

Mission accomplished?

## Well-defined

Can all programming languages be defined with our version of lambda calculus? After all, we chose a particular order of evaluation.

Yes!
Reynolds'
landmark paper of 1972 surveys multiple languages defined in multiple variants
of lambda calculus, then reveals this variety is unnecessary. Thanks to
*defunctionalization* and *continuation-passing style* (CPS), we can get by
with a lambda calculus without first-class functions (is this the same as
kappa calculus?), and which
evaluates in any given order.

A few years later, Steele and Sussman wrote the lambda papers: a series of cookbooks describing how to define many practical programming constructs in lambda calculus.

Hutton and Bahr *calculate* a
correct compiler from its specification, showing the power of precise
definitions. They have also fused
CPS transformation and defunctionalization into a single step.

Elliott’s work on compiling to categories uses bracket abstraction to yield a compelling alternative to domain-specific languages.

Chlipala wrote
*A Certified Type-Preserving Compiler from Lambda Calculus to Assembly
Language*, which we take as a license to be informal when we please.
We can always fall back to this paper to see how it’s really done!

See also Papers We Love.

*blynn@cs.stanford.edu*💡