Saturday, April 18, 2015

Use and meaning of Types

Before my simplistic take on Types, here is some history:

Types in programming languages come at us from two very different directions: the brutal reality of programming physical computers; and the rarefied realm of the foundations of Mathematics (starting with Bertrand Russell's theory of Types). Not surprisingly the collision of the two currently taking place in modern programming languages is not a very comfortable event.

In the beginning of programming (Assembly language, BCPL) there were groups of bits, and the programmer needed to know what they meant so that sensible things happened. But there was nothing to stop you doing floating point adds on integers, or using 0 (or other integers) as memory addresses. Then languages gradually added types to avoid simple errors and facilitate conversions. Algol68 nearly got it right, but 2 big errors (no closures and combining pointers and variables) led to a general loss of interest in getting it right. This led to brilliant but hacky languages: C, C++, Java, Python, and more.

Meanwhile the Mathematicians struggled to understand various sorts of logic and associated Type theories. There were various attempts to build programming languages on these ideas leading to current languages such as Haskell and ML. Then came software to support the development and verification of mathematical proofs. These turned out to be similar to programming languages. There was also an increasing need to prove the correctness of programs. So this was a natural way for Types from the mathematical side to force their way into real world programming. Very recently we see the creation of Homotopy Type Theory (HoTT) which seems to bring the automation of mathematical proof and the verification of software a lot closer.

Wombat comes very much from the practical programming side. It can be seen as an effort to fix Algol68 and add some modern things. I am struggling to understand the Mathematical side of Type Theory, because it is particularly important to incorporate proofs of correctness to Wombat. I'm quietly confident (hopeful) that Type Theory types can be expressed correctly using Wombat's types.

A traditional view (e.g. in C) is that Types describe how data is laid out in memory. This goes along with the pre-functional style of programming where all names are variables (leading to the extreme position in Java and many other languages where the poor programmer has to deal with mutable pointers to blobs full of more mutables). This then leads into the quagmire of distinctions between value types and reference types.

A functional-oriented approach is to view types as the string of bits that get put on the stack when a value of the type is being passed to a procedure (assuming a completely unoptimized execution model), plus a collection of primitive operations for that type which programmer and compiler need to keep in mind. This is where Wombat plans to be for primitive types, except that there are also axioms for those primitive operations. The actual types that a programmer is more likely to use in Wombat are defined by their operations and associated laws (which are proved true from the Implementation type's operations and laws which are axioms).

Only primitive types remain at run time, which is not to say that values of those types are actually computed. For example we create a mutable (=assignable) Int64 on the stack by:
`x = loc Int64 ()
But the resulting memory address is a fixed distance from the frame pointer and we expect the compiler to just take a note of that and generate the address on the fly whenever necessary.

However the idea of types that really motivates a lot of Wombat is this:
The type of an expression is just everything the compiler knows about it.
This lets us look at a program upside down. The natural view is that during execution of the program, expressions take on values which become grist for the mill of other expressions. The upside down view is this:

  • We start by building the program's abstract syntax tree (AST), and then execute that symbolically to create what we might call an Abstract Execution Tree (AET). 
  • The AET cannot follow recursive calls too far or it will be infinite. It has to be lazy from some depth. Or maybe every optional bit that might never be executed is lazy.
  • We work out everything that can be worked out before any execution takes place. At this point the program can't make further progress till it runs and interacts with the outside world.
  • Initially and then between each interaction we evaluate expressions. Each expression we evaluate must narrow the range of possible values of some other expression: otherwise there was no point in evaluating it and the evaluation engine shouldn't have done so.
So instead of the compiler generating code to execute the program, it generates code to do optimal JIT specialization of the whole program. The final specialization completes program execution. One fondly imagines that this process can be highly parallel.

This view of types blurs the distinction between static and dynamic typing. Indeed it is somewhat inspired by Yin Wang's essays on this. Sadly he seems to decide that his old essays aren't quite right and deletes them.

No comments:

Post a Comment