The Wombat Programming Language: 2024

Tuesday, December 3, 2024

The case for Atoms and automatic conversions

Nearly every programming language uses dot notation, with expressions such as x.y. Yet the y here is not a value, and this makes it rather unlike other expressions. I have a simple remedy.

Atoms are having a revival. Elixir is a popular language which uses them. My proposal is that atoms are created by a dot followed by letters. So .y is an atom constant. And just as f z is a function call, the same as f(z), so x.y is a function call, the same as x(.y).

The problem is that x is not usually a function, but rather some sort of structured object with named subcomponents. The solution is for x to have an automatic conversion to the right function that will pick out the required subcomponent. This is a function that will be simplified at compile time with no run time overhead. The syntax that generates our structured object can autogenerate the required conversion declarations.

The order lattice of types described in the previous post (https://wombatlang.blogspot.com/2024/11/the-case-for-subtypes-real-world-and.html) generates automatic conversions between types which have a non-empty Intersection. The conversion always goes down to the Intersection (which might fail) then up to the destination.

Ad hoc automatic conversions can be declared, but only if the source and destination have an empty Intersection. That will be the case for the conversions we want from our structured object to the function that accepts our specific atom argument.

I hope you agree that it is nicer if everything is just procedure calls with no magic special meanings.

Saturday, November 30, 2024

The case for Subtypes: real world and programming

[This is cross-posted from my general blog: https://grampsgrumps.blogspot.com/.]

Subtyping fits naturally with our human view of reality. Consider these sets of individuals: Dogs, Cats, Humans, Crocodiles, Mammals, Quadrupeds, Animals. There is a natural relation between them represented by this diagram:

At a simple level these are just sets of individuals and the arrows represent set inclusion. But clearly there is more to it than that. The Dog type is distinguished by properties that let us identify an individual as being a dog. And we can see that there is a symmetry in the diagram. As we go up we get more and more individuals in the type, but as we go down we get more and more properties.

This perfectly natural form of subtyping is the right way to do subtyping in a computer programming language.

Now imagine a box that always has a dog in it, and we can change the dog inside to any other dog. Nobody would think that the type of such boxes was a subtype of boxes that have a mammal in it and we can change the contents to any other mammal. The mammal box isn't a dog box because it might have a non-dog inside, and a dog box isn't a mammal box because it isn't big enough for all mammals. They are just different.

Computer memory is like those boxes. Consider 16 bit integers. They can be naturally seen as a subtype of 32 bit integers, since every such can be represented in 32 bits, and 16 bit integers have the property that the integer fits in 16 bits. We can look at a 32 bit integer and decide if it fits in 16 bits. Now consider memory locations that have enough space for a 16 bit integer. These can't be used if what you want is a memory location that can hold a 32 bit integer. And the reverse is equally true: if you have a memory location with a 32 bit integer in it, then you can't use it where you expect a memory location with a 16 bit integer. Once again, they're just different.

Because computers work with changeable memory locations, many programming languages conflate values, such as integers, with boxes which can hold such a value. The first object oriented programming language was Simula 67 and it had classes that consisted of multiple memory-like boxes. It had subclasses which were a brilliant innovation, but they didn't work as we naturally expect, and that has been leading programmers astray, causing bugs, ever since.

Functional programming style emphasizes values rather than places for changing values. These value types naturally form a subtyping hierarchy that behaves the way humans expect. And it turns out that if you organise your type hierarchy correctly a lot of magical goodness results.

The rest of this post will get more technical, and belongs in my programming blog (https://wombatlang.blogspot.com/) where it will also be posted.

The new operator system

Syntax

The new operator syntax takes its lead from another language (Agda?), However that language's system is less comprehensive, so my solution is more messy.

Instead of a separate way of introducing operators, we just allow a special sort of identifier.

Wombat2 operators are identified by the sequence of mandatory suboperators (such as +, or if-then-else), plus identifying whether an operand is required or absent in each slot (after each, plus one to the left of the first suboperator). We also need the left and right precedence at the start and end, which are decimal numbers greater than 0.

To simplify matters I'm removing subsuboperators, and optional suboperators. Optional can be handled by just having multiple similar operators, though there is the danger of combinatorial explosion. These capabilities can be added to the scheme later if desirable. We do still need repeating suboperators.

Firstly we make the rule that a suboperator can't include any digits.

Now we construct our identifier as follows:

We have an initial backquote (`). This means two backquotes on first use, but first-use backquotes are not required at top (file) level where operators are likely to be defined.
We then have alternating numbers and suboperators, starting and ending with numbers.
A zero in a number spot means that no operand is present there. A non-zero number means that there is an operand required there. 1 should be used on internal operand positions.
The left and right numbers are precedence numbers. If a decimal dot is included it must have digits on both sides.
A repeating subop is indicated by following it with \*. This is zero or more times, so if you want one or more times then include the first instance separately. Note that if a repeating subop is the last subop then both it and the last mandatory need right precedence numbers.
\S is the whitespace suboperator and \J is the juxtaposition suboperator. \\ is a single \.

So we can just write:

`5+5.1 = myAddProcedure # greater right precedence means left associative
`0if1then1else3 = myIfProcedure

Of couse you aren't allowed to define new ones of these that are incompatible with an existing one, such as by having different precedence, or different repeating groups.

Translation

We prefer to associate operators with relevant Behaviours. Consider x+y. We convert x+y to Monoid.add(T).op(x,y). The question is how we work out the type T.

The answer is that we search downwards from the type of x and from the type of y, until we find a pair of types such that there is a type above both that conforms to Monoid.add.

Firstly we'll suppose that our types are: Integer, Rational and ComplexInteger (i.e. Gaussian Integer), with Integer below both of the others, and all conforming to Momoid.add. Suppose we try to add a Rational and a ComplexInteger. There is no type that is above both so we have to move down. There are two solutions: If we move Rational down to Integer then ComplexInteger is above (or equal to) both. If we move ComplexInteger down to Integer then Rational is above both. So we have two possible operations and either or both will work if one of x and y can coerce to an Integer (either because it is Rational with denominator 1, or ComplexInteger with imaginary part 0). So we combine the operations, giving a result that is a Union of Rational and ComplexInteger, and which fails if neither wants to be an Integer.

Now suppose there is a type ComplexRational, that is above Rational and above ComplexInteger. Then we don't need to go down at all. So we use Monoid.add(ComplexRational).op. This coerces both x and y up to ComplexRational and the addition happens there and always succeeds.

The plan is to do sensible things that are close to the way run-time typing works in dynamically typed languages.

Postscript

Note that Monoid.add means the same as Monoid(.add). Since .add is an atom, this means Monoid is a Proc(Atom,Behaviour). This lets us distinguish different ways that a type can be a Monoid. Presumably there will be a Monoid.mul and a Monoid.max and others.

Friday, May 3, 2024

Typed TLA+

As per the previous message, I think wombat2 is a good fit for TLA+, both at the pluscal level and the TLC/TLAPS level.

So I think a more useful thing to do than my current effort will be to create a typed version of TLA+, using my type system. I think it is possible to create a full general purpose programming language that contains TLA+ (approximately) as a part.

I plan to match the syntax very closely, within the limitation that my syntax system can't do backtracking.

Don't hold your breath. I'm guessing it will take years. If anyone wants to help please leave a comment. Left to my own devices I will plan to use: my syntax system; python; copy the TLAPS method of talking to backend solvers (however when there is a proof I eventually want to bring it into the language as wombat2-style code). I'm open to alternative plans (that don't mention java).

Firstly I need to get into actually using TLA+ for a while.

Saturday, April 27, 2024

TLA+ to Wombat2 correspondence

I looked at TLA+ a decade or so ago and didn't grok it. I looked at the wrong sort of documentation. This time I looked at Lamport's "science" technical document: https://lamport.azurewebsites.net/tla/science.pdf. Surprisingly TLA+ has a lot of similarities to Wombat2. Here is the approximate correspondence.

1. TLA+ Sets correspond to Types

In Wombat2 there is a lattice hierarchy of types that make them behave quite like sets. In TLA+ there are no types, but the way he uses sets, such as the set of Integers, corresponds closely to wombat2 types.

Lamport also claims that "everything is a set" using the ZF (Zermelo-Frankel) system. So 3 is a set, and maybe 2 is an element in 3. Such nonsensical questions don't arise in Wombat2. Yes, you could use ZF as the implementation of natural numbers, but that doesn't make properties of that implementation into properties of the type being implemented. This doesn't cause any conflict between TLA+ and wombat2 because TLA+ doesn't make any substantive use of this.

2. TLA+ mapping corresponds to Proc

On page 14 he talks of "mapping", giving * (multiplication) as an example that is defined for numbers. Since there is no type system to exclude non-numbers, we presume the mapping fails for non-numbers. And so this corresponds to Proc.

A mapping that always gives a boolean result is called a "predicate". This corresponds to a TProc(X,Bool). He gives "=" as an example, and in Wombat2 "==" is also defined for any pair of values.

3. Expressions and Formulas, meanings and interpretations

Numeric expressions using constants and arithmetic operations are distinguished from Boolean expressions, also called formulas. These exist in wombat2, of course.

Then on page 15 we see algebraic expressions and algebraic formulas. These have "variables" in them. Wombat2 reflection allows looking inside expressions. Identifiers and operators, which are compile time things, get converted to atoms which are runtime things. Wombat2 is more general, since Lamport is only talking about numbers. TLA+ writes the meaning of any of these with special (doubled) square brackets: ⟦exp⟧. Then an "interpretation" is a mapping from variable identifiers to values. In wombat2 the equivalent would be a record type mapping atoms to values. If ϒ is an interpretation in TLA+ then ⟦exp⟧(ϒ) is the value of the expression with that assignment of values to variables. Wombat2 can do something similar, but it is only in a part of the language not meant for ordinary programming.

Formulas are also similar to wombat2 proposition types. These are types that correspond to a a proposition, with a value of the type being a witness.

4. Subsetting corresponds to subtyping (including refinement/subset types)

TLA+ { x∈X | F(x) }, for some formula involving x, corresponds in Wombat2 to X/{ F%($) } where F% is a Type expression with an input argument of type X, and where values of X are included if the generated type has a value (witness).

5. Mappings and Functions

In TLA+, mappings have 0 or more untyped arguments. The defining expression may be only valid for arguments from some "sets". Mappings can cover things that are too big to be sets. In wombat2 things too big to be types are in a higher universe, but I haven't worked out how that exactly works, but currently there is no similar distinction.

A function in TLA+ has one input, that can be a tuple, and one output, which can also be a tuple. So corresponds pretty well to TProcs in wombat2. Where TLA+ writes v∈S↦exp, that corresponds to {'S=>T: `v=$; exp }. Functions(=TProcs) are values just like numbers in TLA+ and wombat2.

6. Lists, Tuples, etc

TLA+ has lists (in angle brackets) that are "ordinal sequences. Ordinals are "first, "second", etc. So TLA+ has indices starting at 1 so that the second item is item 2. Then it has ordinal sequences that are meant for things that happen starting at time 0, so these are 0 based, and the entries are separated by right arrows: →. The latter seems to correspond to the semi-colon separator between effects in wombat2.

7. Rendering effect-based programs as mathematics

We then come to the parts of TLA+ that are used in proving things about programs. This is the stuff that wombat2 would like to learn from.

The first trick is to divide the execution into steps. Then each variable turns into an array of values, giving the value at each step. Then the execution of the program can be expressed as logical facts about how these variables have values at each step. Wombat2 has a start in this direction with the TimeOrder values marking the start and end of effectful expressions.

Logical operations in TLA+ are extended with "always" (▢) and "eventually" (◊) and "always eventually" and "eventually always".

Monday, March 18, 2024

The Lattice Type System

I'm making very bad videos. Here's an intro to wombat2:

Here's the the one on wombat2's type system:

Sunday, March 10, 2024

Proposition types complement subtyping

The way wombat2 works, both at compile time and runtime, is that actions are taken to move an expression's known type down the hierarchy. When we get to a type restricted to a single value then we have evaluated the expression. When we get down to a type restricted to the empty set of values, then expression evaluation has failed.

A natural question is: why do we have types representing propositions, such as: x <% y. Why not just simple boolean expressions.

Suppose we have a type Dog, and it's a subtype of Animal. We have a value x of type Animal, and we have some proposition about it: say hasTail(x). Moving down the subtyping hierarchy can't invalidate a witness of this, or a witness of its negation. But if we can't decide until we've moved to a more specific subtype then the proposition and its negation both stay empty. Propositions can be evaluated at the right level of generality.

Saturday, March 9, 2024

Wombat reborn ... as Wombat2

I've decided to bring back the wombat name. Except that it is now wombat2 - no space before the 2. This will hopefully be easier to search for.

Future posts back at https://wombatlang.blogspot.com/.

Also: I've started posting videos on my PLodicy youtube channel: https://www.youtube.com/@RobertSmart/.

The Wombat Programming Language