Saturday, August 19, 2006

What's inside um.um....

A couple of days ago I thought I'd look at the self-interpreter (um.um, released by the organisers at the end of this year's ICFP Programming Contest) in detail to see if I could build the corresponding matrix (as described in an earlier post), and hence calculate the exact eigenratio for this interpreter (= dominant eigenvalue of that same matrix). Although I knew that my earlier thinking (based on experimenting with very small instruction sets) was probably simplistic in some respects, at the same time I hoped there would be some way around any issues that might arise in more complicated systems. As as it turned out, there were indeed some difficulties when dealing with the UM instruction set.

The matrix I finally obtained for um.um (with some "simplifications" which are described later in more detail) is:

| 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0 |
| 6, 6, 6, 5, 5, 5, 5, 3, 4, 4, 4, 3, 5, 3 |
| 1, 1, 1, 1, 1, 1, 1, 0, 2, 0, 0, 1, 0, 1 |
| 2, 3, 3, 3, 2, 2, 2, 2, 3, 2, 2, 2, 3, 3 |
| 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0 |
| 3, 3, 3, 3, 3, 4, 3, 1, 2, 1, 1, 1, 2, 2 |
| 6, 6, 6, 6, 6, 6, 7, 0, 4, 2, 2, 2, 4, 4 |
| 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0 |
| 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0 |
| 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0 |
| 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0 |
| 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0 |
| 2, 3, 3, 2, 2, 2, 2, 1, 2, 2, 2, 2, 3, 1 |
| 4, 8, 8, 4, 4, 4, 4, 3, 4, 3, 3, 3, 7, 4 |

The first row/column is for opcode 0 (Conditional Move) and then all the others follow in numerical order until you get to the rightmost column and bottom row which correspond to opcode 13. So, by reading down the first column (which corresponds to the sections of code executed in um.um whenever it is emulating a single opcode 0 from the next level up), we can see that um.um executes opcode 0 once, opcode 1 six times, opcode 2 once, opcode 3 twice, etc., etc. Reading across the rows shows us how often um.um is executing each instruction in the course of emulating other instructions. There are a number of rows that have just a single count of 1 and are 0 elsewhere. These rows correspond to instructions that um.um doesn't use except in a single place (in order to delegate the handling to the next level *down*). The first such row is for the multiplication instruction.

Anyway, the dominant eigenvalue for this matrix is 24.4669... which agrees very closely with what I had seen in my "experiments".

I obtained the matrix by first doing a basic kind of flow analysis of um.um, and then wrote a small tool that used the results of that analysis to sum the number of times different instructions appeared in the relevant sections of the interpreter.

But, as mentioned earlier, I ended up making some simplifications because um.um sometimes includes conditional branching and looping within the sections of code responsible for emulating a single instruction from the "next level up". In particular it does this for opcodes 1 and 2 (with different execution paths depending on whether the array id is zero or not) and also for opcode 12 (where as well as skipping the array copy when the source id is zero, it also has a loop (with a variable number of iterations) to copy the contents of the source array when the source array id is not zero). That branching and looping makes counting the instructions executed rather difficult!

At first I thought I could "decompose" the instructions that had multiple execution paths into some different "virtual" ops, each with their own column and rows in the matrix. But that idea didn't seem to work because you can't (in general) look at an arbitrary piece of UM code that (for example) is using opcode 1 (array index) and determine if it addressing array zero or not - at least not statically, because it depends on what register B holds at that time.

However, after thinking about this and looking at the code a bit more I decided that in this case (where um.um will "mostly" be running copies of itself) I could make some reasonable looking simplifications. It appeared that um.um is always addressing array 0 in its "own code" and doesn't ever replace its own array 0 with anything else. So the matrix above is simplified in the sense that the values in the columns for opcodes 1, 2 and12 correctly reflect only the execution path for when the behaviour controlling register is also zero - meaning register B for opcodes 1 and 12, and register A for opcode 2). In other words the simplified array should be "exactly correct" if we can guarantee the program being interpreted is another copy of um.um interpreting another copy of um.um interpreting another... (forever)!

My earlier experiments with inventing my own simple VM and instruction set and then writing a self-interpreter for it didn't expose this kind of issue and it was possible to derive the exact matrix for the interpreter directly and without any complications. Obviously the real world is messier! For something like the UM language and um.um self-interpreter it might still be possible to do some kind of recursive, iterative analysis first to converge on a final version of the matrix that is "accurate" for an infinitely high "stack" (maybe the word "tower" is better?) of interpreters interpreting each other. This is pure speculation, and I have no idea how to do that or if it is even possible. But if something like that is possible, then perhaps the simplified matrix I'd given above is in fact the same result for um.um, which it turns means the "eigenratio" of 24.4669... is also correct.

There are a bunch of other assumptions that I've made also. Even if you can construct the matrix for a given self-interpreter exactly, do we know it will always exhibit the "convergence" that is desired (as per the Power Iteration algorithm to find the dominant eigenvalue and corresponding eigenvector)? I suspect it does but haven't found any proof. The matrix is non-negative (all values >= 0) and the same applies to the other vectors that are part of the series of matrix operations that can be used to describe multiple interpreters "running each other" in a tower of arbitrary size. That may be sufficient to imply convergence. Any matrix theory experts out there?

Just for completeness, my examination of um.um's internals indicates the following:

000-009: Start-up (mostly NOPs in effect, as these platters ultimately used for as the registers of the next level up and perhaps the "execution finger" also).
010-017: Executed during start-up, but is also used to handle opcode 13.
018-027: Decode opcode and dispatch according to following table.
028-041: Addresses for code handling each of the 14 instructions.
042-056: Handle opcode 0.
057-065: Lead in for opcode 1, determine if reg B is 0.
066-077: Handle opcode 1 when reg B is 0.
078-089: Handle opcode 1 when reg B is not 0.
090-098: Lead in for opcode 2, determine if reg A is 0.
099-110: Handle opcode 2 when reg A is 0.
111-122: Handle opcode 2 when reg A is not 0.
123-136: Handle opcode 3.
137-150: Handle opcode 4.
151-164: Handle opcode 5.
165-178: Handle opcode 6.
179-179: Handle opcode 7 (nothing too complex here!).
180-191: Handle opcode 8.
192-196: Handle opcode 9.
197-201: Handle opcode 10.
202-206: Handle opcode 11.
207-215: Lead in for opcode 12, determine if reg B is 0.
216-221: Handle opcode 12 when reg B is 0.
222-254: Handle opcode 12 when reg B is not 0 (create a new array 0, and copy the contents of the source array to it).
255-255: Initialised with the value 2^28. (Corrected on 27 August.)

Register Usage:

Register 0 holds the "execution finger" for the program being interpreted.
Register 2 holds the value 7 (a bitmask for extracting register ids), except when emulating opcode 12 and the emulated register B is non zero.
Register 5 holds the value 18 (address for start of main interpreter loop) except when emulating opcode 12 and the emulated register B is non-zero.
Register 6 is zero all the time.

The simplified matrix given earlier therefore ignores the instructions in platters 78-89, 111-122 and 222-254. I'm pretty sure that when enough copies of um.um are running each other that this doesn't matter and so the above matrix gives the correct eigenratio. But you can't use the above matrix to get precise counts of instructions executed in other cases.