Simula 67 is just a few years from its 50th birthday. It was a long way ahead of its time. One of the things that made it possible to do all the things it did, like simulated or real time concurrency, was that it didn't use the hardware's simple but efficient stack. Instead it threaded the stack on the heap. This made it easy to have multiple stacks going simultaneously. [However it was stupid that, at least when I used it, it didn't return popped memory to free, but just left it around for the garbage collector.]
This seems like it would naturally make it a lot easier to have programs containing laziness, concurrency and parallelism, with separate processors working on separate subexpressions. Forty years ago the extra overhead of threading the stack was too inefficient (and we were pleased when C came along, even though it was much less powerful). But now I think it would be ok. Even better if memory technology supported it.
[update 7-Sep-2014: My old brain finally remembered that in Simula67 an object was just a function that left its local variables and inner procedures lying around after it finished. Also you could transfer pointers (refs) to local stuff without problems because these stack areas wouldn't be GCed if there were pointers into them. So there were substantive reasons why it worked as it did.]