Implementing a Correct Type-Checker for the Simply Typed Lambda Calculus

November 25th, 2009 by Darin Morrison

Last Friday I discussed how to implement a “correct” type-checker for the simply typed lambda calculus in a dependently typed metalanguage – in this case Agda.

What does it mean to implement a correct type-checker? Stepping back for a moment, most people have probably at one point or another encountered a buggy type checker. Such a type checker will occasionally give the wrong result either (1) by saying that an ill-typed term does indeed type-check, or (2) that a well-typed term doesn’t type-check.

For example, if we have a typing judgement and a type-checking function , a buggy type-checker could give either of the following:

(1)

(2)

A correct type-checker should be constructed in such a way that both of these cases are always ruled out.

In order to construct such a type-checker, we will leverage the richness of our type system in order to prevent us from writing buggy code. Looking back at the type of our check function, we see that it returns a . The problem is that there is nothing in this type that ensures that the value we return is actually related to our input in any way. Our check function could be a constant function returning for all we know.

We can fix this by refining to a sum type of proofs:

What’s going on here? We’re saying that if type-checking is successful, we should have a proof in the form of a typing derivation. If type-checking fails, we want a proof that such a derivation can’t exist (or we would have a contradiction).

Stepping back again, if we look at this new type for the type-checking function through our classical glasses, we obtain:

which is completely trivial by the LEM. However, we are working in an intuitionistic metalanguage and by the disjunction property of intuitionistic logic, an implementation of our well-specified checking function is both a proof that type-checking is decidable and a decision procedure for actually doing so!

This is all quite nice so far, except for the fact that the standard typing rules for the STLC are non-algorithmic. By this, I mean one cannot transcribe the typing rules completely straightforwardly as a functional program. This is most apparent by considering the application case:

This rule must be read bottom-up to be consistent with an algorithmic interpretation realised as a structurally recursive checking function. But where does come from? We have to more or less pull this type out of thin air to continue the recursive type-checking. Also, in a real implementation, even if we are able to somehow infer some type, say , how do we know that ?

There are ways to get around this, such as by modifying the check function to also return a type, and implementing equality for types, but the problem with this approach is that we now have a mismatch between how we understand the typing judgments and how we understand type-checking. Furthermore, the specification of the check function becomes more complicated.

A better solution is to realise that we need to change the typing rules themselves to reflect when a term should be analysed against a given type and when a type should be synthesised from a term (as in the case for the recursive call on M).

We do this by splitting the original judgement into separate analytic and synthetic judgments.

( checks against )

( synthesises )

Now we can rewrite the application rule so that, read bottom up, it provides an algorithm from which a case arm in a functional program is immediately derivable:

(As a side note, there is an interesting philosophical argument described by Zeilberger et al. as to why the synthetic judgement should be read top-down and the analytic judgement bottom-up, but I will skip the details here…)

The full set of rules so far becomes as follows:

( checks against )

( synthesises )

( is equal to )

These rules should seem right with the exception of the last one. We need this rule so that we can, for example, check a variable in the body of a lambda. But there’s something wrong with it.

The problem is that, unlike all of the other rules, it is not associated with a particular constructor or destructor, which makes it non-syntactic. If we take it as it is, type-checking will no longer be deterministic and syntax directed which would be bad after all we’ve considered so far.

We need to introduce a new constructor to make this rule syntax directed, let’s call it “syn”, because when we encounter it, we recursively synthesise a type for its argument:

So we’ve added a constructor to a term language, which makes type-checking syntax directed again. But careful consideration reveals that we’ve created a new problem. Some terms which should be checkable are no longer checkable if they occur under “syn”. Consider . If we know the type of the codomain for the lambda term, we should be able to check it, but in this case, our type checking algorithm is going to try and infer the type of the lambda term, but looking up at our rules, we see there is no synthesis rule for lambda, so checking is going to fail.

Luckily, we can fix this too. We need to break terms M up into two syntactic categories, one for checkable terms and one for synthesisable terms:

Seems good so far, right? Well, not quite. We’ve successfully broken the original term language up into the appropriate syntactic categories, but in the process we’ve restricted the language in such a way that we cannot write any terms containing redexes. Everything we can write is already in normal form. Oops!

The culprit here is the fact that the only thing we can put in the left hand side of an application is an R term, but lambdas are M terms. We need a way to embed M terms back into R terms. Since the only real checkable term is a lambda term, and we cannot infer types for these, how can we possibly embed them into R terms? By adding a type annotation! Let’s add a constructor of the form . If we were to put this on a lambda term, the type annotation for the variable would become superfluous, so we might as well drop that entirely and only use the annotation constructor. Let’s consider the final system with all of our changes in place:

( checks against )

( synthesises )

( is equal to )

What impact is all of this going to have on our checking function? First, we’re going to have to create a new function corresponding to the synthetic judgment, and refine the types of the arguments slightly to reflect the fact that we now are dealing with two separate syntactic categories:

With all of these changes in place, it becomes pretty straightforward to write ‘check’ and ‘synth’ as two mutually recursive functions. The types will prevent us from writing buggy code and our type checker should be correct by construction.

If we plan to use this type-checker in the “real world”, we might want to change the negative evidence type to two separate inductive types:

and

Then we would want to show that these new inductive types are sound with respect to the negation of the original judgement:

implies

and

implies

This would allow us to produce error messages based on the negative evidence, since it is inductive whereas the negation (which is a function type) is not.

**UPDATE 1**

I’ve added some screenshots showing an implementation of what I describe above in action:

Type-checking the S combinator:

Type-checking an incorrect version of the S combinator:

Type-checking (and normalising) an arbitrary term:

** UPDATE 2 **

If you would like a copy of the source code, feel free to email me.

Also, I should point out that this idea of constructing a correct type-checker is not an original idea.  At least one earlier iteration of a very similar idea was described by McBride and McKinna in “The view from the left”.  There are two differences with what I do here.  The first is that I use a bidirectional type system instead of the traditional one.  The other difference is that I show that the judgment for negative evidence is sound with respect to negation of the judgement for positive evidence.  This step allows me to prove that type-checking and type-inference are decidable.

Scala: An Object-Oriented Language with Functional Programming Features

November 25th, 2009 by Darin Morrison

A couple of weeks ago I gave an overview of the Scala language.  This was originally prompted by a recent discussion about languages for teaching and what might be a nice replacement for Java.  I suggested Scala and since I was familiar with the language I was asked to give an introduction.

Scala is an object-oriented language with a sophisticated and flexible type system.  It also supports quite a few features which are familiar to functional programmers.  Some of the features of Scala include:

* Bounded (upper and lower) parametric polymorphism

* Pattern matching via case classes

* First-class “functions” via functional objects

* Type constructor polymorphism (quantifying over types with higher kinds)

* A generalized comprehension syntax (can be used for sequence comprehension or for monadic computations)

* Existential types

etc.

Although Scala’s type system is rather sophisticated, the basics of the language are easy to learn and there are quite a few tutorials and references available online:

http://www.scala-lang.org/node/197

Parser combinators are as expressive as possible

October 28th, 2009 by Nils Anders Danielsson

What is the expressive strength of parser combinators? Which languages can be expressed using them? The question depends on the definition of parser combinators used. In our last meeting I showed that a simple parser combinator library which I have implemented in Agda is as expressive as possible: it can be used to recognise exactly those languages (sets of bit strings) for which a decision procedure can be implemented in Agda.

Consider the following language of parser (or perhaps recogniser) combinators:

data P : Bool → Set where
  ∅   : P false
  ε   : P true
  tok : Bool → P false
  _∣_ : ∀ {n₁ n₂} → P n₁ →        P n₂  → P (n₁ ∨ n₂)
  _·_ : ∀ {n₁ n₂} → P n₁ → ∞? n₁ (P n₂) → P (n₁ ∧ n₂)

The parsers are indexed on their nullability; the index is true iff the parser accepts the empty string. The first four combinators are unsurprising: failure, the recogniser for the empty string, the recogniser for a given token, and symmetric choice. The last combinator, sequencing, is more interesting: the second argument is allowed to be coinductive if the first argument does not accept the empty string:

∞? : Bool → Set → Set
∞? true  A =   A
∞? false A = ∞ A

(Read ∞ A as a thunk, i.e. a suspended computation of type A.) This ensures that grammars can be infinite, but only in a controlled way. If you check out the code you will find a formal semantics for these combinators, and a (verified and total) procedure which decides whether a given string is a member of a given language defined using the combinators. This shows that every language definable using the combinators is decidable. (For more details, see Mixing Induction and Coinduction.)

What about the other direction? Given a decision procedure of type List Bool → Bool a corresponding grammar can be produced. The idea is to build an infinite tree with one “node” for every possible input string, and to make this “node” accepting iff the decision procedure accepts the given string:

accept-if-true : ∀ n → P n
accept-if-true true  = ε
accept-if-true false = ∅

grammar : (f : List Bool → Bool) → P (f [])
grammar f = tok true  · ♯ grammar (f ∘ _∷_ true )
          ∣ tok false · ♯ grammar (f ∘ _∷_ false)
          ∣ accept-if-true (f [])

(Read ♯_ as “delay”, i.e. as the constructor of a thunk.) For a formal proof of the correctness of the grammar, see the Agda code. I have also generalised the proof to proper parser combinators, for which the statement is that, for every R, every function of type List Bool → List R can be realised using parser combinators (if set equality is used for the list of results); if the bind combinator is allowed, then the result easily generalises to arbitrary token types.

The language defined by grammar f above can be parsed using deterministic recursive descent with one token of lookahead; grammar f could be viewed as an “infinite LL(1)” grammar. I suspect that this grammar scheme can be implemented using many parser combinator libraries. I also suspect that the result above is already known. It has been known for a long time that a potentially infinite grammar can represent any countable language (I found this mentioned in Marvin Solomon’s PhD thesis from 1977), and given that fact the result above is only a small step away. Pointers to the literature are welcome.

Factorising folds for faster functions

October 16th, 2009 by Graham

When I visited Galois a couple of years ago, I worked with Andy Gill on formalising the worker/wrapper transformation that is used within GHC to improve the performance of programs. The theory that we came up with was simple and general, but was based upon modelling recursive programs in terms of a fixed point operater, and hence wasn’t able to exploit additional structure that may be present in many programs.

In today’s talk (based upon joint work with Mauro Jaskelioff and Andy Gill), I showed how the worker/wrapper theory can be specialised to the structured pattern of recursion encapsulated by fold operators, and discussed the resulting benefits. I gave one simple example, deriving fast reverse, and one more complicated example, verifying the correctness of an efficient implementation of substitution from Voigtlander’s MPC 2008 paper. Further details are in our new paper.

Left Kan extensions of containers

October 3rd, 2009 by Thorsten

I find that it is often helpful to look at operations on functors instead of datatypes. An example is the exponential of functors which turns out to be an operation on containers. Recently, I have been looking at left Kan extensions, or short Lans. Left Kan extensions are defined as the left adjoint to composition, i.e. the set of natural transformations from F to G \circ J is naturally isomorphic to the set of natural transformations \mathrm{Lan}_J\,F to G.

There is a useful explicit representation of Lans using coends:
\mathrm{Lan}_J\,F = \int^X(J\,X \to A)\times F\,X
(for simplicity I assume that the range of the functors is Set).
The coend is actually the quotient of a large \Sigma-type:
 \dots = (\Sigma X.(J\,X \to A)\times F\,X)/\sim
where \sim is the equivalence relation generated by
 (X,g\circ J\,h,x) \sim (Y,g,F\,h\,x)
for any h\in X\to Y.

Neil Ghani found a nice application of Lans to give initial algebra semantics to GADTs. As an example consider an encoding of Fin as a GADT:

\begin{code}
data Fin : Set -> Set where
  fz : Fin (Maybe a)
  fs : Fin a -> Fin (Maybe a)
\end{code}
Can we understand Fin as an initial algebra of an endofunctor on the category of endofunctors. A problem is the occurence of Maybe on the right hand side. Using Lans we can shift this on the right, i.e. Fin = \mu H.\mathrm{Lan}_\mathrm{Maybe}\,(1 + H).

What has this to do with containers? A container is a functor of the form F\,X = \Sigma s\in S.P\,s \to X where S\in\mathrm{Set} and P\in S\to\mathrm{Set}. We write F = S \lhd P. Now given also G = T\lhd Q we can use the coend formula to work out the container representing \mathrm{Lan}_F\,G:
 \begin{eqnarray*}
\mathrm{Lan}_F\,G\,A & = & \int^X (F\,X \to A) \times G\,X \\
& = & \int^X (\Sigma s\in S.P\,s \to X) \to A) \times \Sigma t\in T.Q\,t \to X \\
& \simeq & \Sigma t\in T.\int^X (\Sigma s\in S.P\,s \to X) \to A) \times Q\,t \to X \\
& \simeq & \Sigma t\in T.(\Sigma s\in S.P\,s \to Q\,t) \to A)\\
& = & t \in T \lhd \Sigma s\in S.P\,s \to Q\,t
\end{eqnarray*}
The interesting step isthe 4th equality from the top. I worked it out using the explicit representation of coends as a \Sigma-type, but maybe there is a more abstract way to prove it. I also checked the equations by plugging it into the definition of a Lan as an adjunction using the explicit representation of container morphisms.

Now if we apply the formula to Lan_{\mathrm{Maybe}}\,G, where \mathrm{Maybe} = b\in 2 \lhd b=1 we obtain
Lan_{\mathrm{Maybe}}\,G = t\in T \lhd \Sigma b\in 2.(b=1)\to Q\,t
which can be further simplified to
 \dots = \in T \lhd 1 + Q\,t
Hence, Lan_{\mathrm{Maybe}}\,G\,A = A \times G\,A. Indeed, we could have proven this result directly using the Yoneda lemma (which also shows that it doesn’t depend on G being a container). Applying this isomorphism to Fin it turns out that there is an isomorphic datatype which isn’t a GADT, namely:

\begin{code}
data Fin' : Set -> Set where
  fz' : a -> Fin' a
  fs' : a -> Fin' a -> Fin' a
\end{code}
I leave it to the reader to work out the isomorphism between Fin and Fin’.

Btw, what about Right Kan Extensions? I can show that if F is a container then \mathrm{Ran}_F\,G always exists (even if G is not a container). However, \mathrm{Ran}_F\,G isn’t necessary a container even if both F and G are.

Folding Statistics

July 17th, 2009 by Tom Nielsen

Today, I talked about a statistics library I am developing for Haskell that has two advantages over existing libraries on hackage: single-pass statistical calculations with O(1) memory requirements are composable, and they can be applied to any collection type for which we can implement a fold.

A statistic is a usually a real valued function defined for a collection of observations, but lets generalise a bit and make it polymorphic in the type of the elements of the collection and the result type. We could represent a statistic (e.g. mean, standard deviation) as a function [a] -> b, but that would have two disadvantages: we are stuck with lists, and if we want to define new statistics (for instance, the coefficient of variation, which is defined as the ratio of the standard deviation to the mean) from old, we will traverse the collection at least twice – at least once for each statistic.

The trick to solving this is to use what Max Rabkin calls Beautiful Folding, which is apparently similar to a banana split. We define

data Fold b c = forall a. Fold (a -> b -> a) a (a -> c)

and define a “both” combinator with type Fold b c -> Fold b d -> Fold b (c,d). We can now write the functor instance for Fold:

fmap f (Fold op init k) = Fold op init (f . k)

and applicative

return x = Fold (\_ _ -> ()) () $ const x
f <*> g = uncurry ($) `fmap` f `both` g

as suggested by Twan van Laarhoven. We can now write simple definitions

sumF = Fold (+) 0 id
lengthF = realToFrac `fmap` Fold (\n _ -> n+1) 0  id

meanF = (/) <*> sumF <*> lengthF

Let’s make a small change to the definition of Fold

data Fold b c = forall a. Fold (a -> b -> a) a (a -> c) (a->a->a)

The last field is a function that tells me how to combine two folds. I have to update my definition of functor and applicative instances, which is trivial. The primitive statistics aren’t much more complicated:

sumF = Fold (+) 0 id (+)
lengthF = realToFrac <$> Fold (\n _ -> n+1) 0  id (+)

But now, if I know how to split my input collection type into chunks, I get parallel statistics, almost for free. I can split my observations into N chunks, fold each chunk on a seperate processor, and then combine the folds before running the continuation.

Some observations about μν

July 10th, 2009 by Nils Anders Danielsson

Today Thorsten and I talked about nested fixpoints of the form μX.νY.F X Y. It turns out that such fixpoints cannot be represented directly in Agda. More details are available in a post to the Agda mailing list.

Additionally here is the whiteboard from the meeting:
munu
/Thorsten

Numbers in Agda

June 26th, 2009 by Thorsten

I was talking about Li Nuo’s (our summer intern) project today. The goal is to develop a library for numbers in Agda which initially covers the integers and the rational numbers and in the long run also the Reals and the complex numbers. The goal is to establish basic algebraic properties such as that the integers are a ring and the rational numbers a field.

Instead of defining the operations on a canonical representation of numbers I suggest to define them first on a setoid and then define a normalisation operation from the setoid to the canonical representation. The laws of normalisation are the usual ones we know from the lambda calculus (e.g. look up James and my paper on big step normalisation). As I only realized after the talk it is not actually necessary to define the operation on the canonical representation – they can be simply lifted from the operation on setoids.

As we know this deosn’t work for the Reals because there is no canonical representation. A simple way to see this is to notice that all sets definable in ordinary Type Theory (w.o. quotients) have the property that if two elements are observationally equal then they are equal, where by observationally equal we mean that all functions into the booleans agree. However, this is not the case for the reals because the only functions from Real to Bool are the constant functions.

Here is the whiteboard from the talk:
numbers

Turning amortised into worst-case

May 8th, 2009 by Nils Anders Danielsson

Today I raised an open question related to my paper Lightweight Semiformal Time Complexity Analysis for Purely Functional Data Structures. The paper discusses a library/type system for Okasaki-style analysis of amortised time complexity (in a call-by-need setting). Assume that the library is used to show that a given program has a given amortised time complexity. The question is whether this program can be automatically turned into one with (nearly) the same complexity, but in a worst-case sense.

Work stealing and implementation of runtime systems

March 16th, 2009 by Graham

Peter talked about work-stealing and implementation of runtime systems. We started with briefly discussing a simple implementation of a runtime system, and the scalability problems that you get [3]. There was a brief passage about the optimality of work stealing [1], to get straight into the new paper about the runtime support for multicore Haskell [2]. At the end we concluded with questioning the premises that seems common: each task is executed exactly once. There is a nice new paper called “Idempotent Work Stealing” [4] which changes the premise to “each task is executed at least once” and show by microbenchmarks that puts and takes executed on this queue only costs 60% of the Chase-Lev algorithm [5].

[1] Scheduling multithreaded computations by work stealing
[2] Runtime Support for Multicore Haskell
[3] Erlang Multicore
[4] Idempotent Work Stealing
[5] Dynamic Circular Work-stealing deque