Friday, December 08, 2006

Eliminate the infinite!

Well - maybe not eliminate it - but at least push it back indefinitely!

But before I get to that, there were at least two other brainf*ck (BF) self-interpreters in my collection that I overlooked in the initial testing of my own self-interpreter (as described in my previous post) - one by Dean Scarff (bfi446.b) and another by "Keymaker" (kbfi.b). However, I've since tested both of those and it seems that mine is still the "fastest". :-)

I also sent an email to all the authors of these various other BF self-interpreters telling them about my quest for a BF self-interpreter eigenratio, hoping that perhaps someone might be tempted to create an even faster version. Dean Scarff and I have exchanged a couple of emails since then and his comments have motivated me to try to collect/clarify some of my half-baked thoughts about aspects of "eigenratio theory", such as it is.

Meanwhile, I've also been trying to find faster ways to run BF programs, and experimenting with some other ideas for improving cgbfi1.b (my first BF self-interpreter) further. I ended up looking at bfc, a BF to C optimising compiler, and although I haven't really used this much but some quick tests seem to show it is the new leader in the "fastest way to run a given BF program" competition. However, Oleg Mazonka has also been working on making a faster interpreter and has posted some new code and results on his BF page, so perhaps bfc might also be overtaken later.

So, back to the "eliminating the infinite"...

I have had some doubts (on and off) over recent times about whether any particular BF self-interpreter would end up with an eigenratio or not. "Infinite memory" and a distance (from current position) related cost for accessing memory were probably the main reasons for my doubts. So now I want to try to outline my reasons for renewed confidence (long may it last!) that a BF self-interpreter will (like the OISC/subleq interpreter and um.um from the ICFP 2006 contest and those for my own toy languages) also eventually turn out to have an eigenratio. (And if I'm wrong, then this post will probably haunt me forever!)

In essence I believe we can find an eigenratio for a self-interpreter in a bounded memory version of a BF machine. Then we can expand that machine to arbitrarily larger sizes and the eigenratio will not change. Thus the eigenratio for the same self-interpreter running on the "infinite memory" (Turing complete) BF machine is also the same.

A more detailed version of my reasoning follows below. (This was significantly modified on the 10th, and again on the 11th of December, in a possibly futile attempt to make it more readable and understandable!)

1. A self-interpreter for a deterministic Finite State Machine (FSM) has an eigenratio (which can be calculated from a corresponding square matrix).

The meaning of a particular language can be described in terms of a state transition system where each state represents a particular snapshot of some 'machine'. (See: operational semantics.) If the underlying machine is 'finite' in all respects and deterministic, then we dealing with a deterministic FSM, and if the language also only supports a finite number of "instructions" (for example, there are eight in BF, although handling some such as the input instruction would require some additional variations) then there must also be a finite set of transitions possible between those states.

A self-interpreter emulates this machine - to some degree at least. Ideally it would emulate the machine exactly with every state and transition in the "real" machine being faithfully mirrored in the emulated machine. However, on a finite machine the emulation must have a somewhat smaller set of states and transitions, because some resources on the underlying machine are being consumed by the interpreter in order to represent the program it is running and to take care of any other bookkeeping that it has to do.

The "execution" of a single "instruction" in the emulated machine causes a particular (emulated) transition to occur at that level. The self-interpreter handles this by executing some sequence of instructions causing a matching sequence of transitions on the underlying host machine. Let's also say that each transition has some fixed positive cost (execution time) associated with it, possibly a different value for different transitions.

From all this, we can (in a theoretical sense at least!) construct a matrix describing the "inner workings" of the self-interpreter (as described in earlier posts) with exactly one row for each transition in the underlying machine, and one column for every transition in the emulated machine. We must also make this into a square matrix by adding sufficient "dummy" zero-filled columns for the transitions that don't actually exist in the emulated machine even though they do exist in the underlying machine (and may be used by the self-interpreter in various ways). Finally, we can reorder the rows and columns so that those same zero-filled columns (and matching rows) appear at the bottom (for the rows) and at the right (for the columns) of the matrix. (This makes things easier to follow later.)

So now we have a large square matrix, with a matching row/column for every possible transition. The dominant eigenvalue of this matrix must also be the eigenratio for the self-interpreter, even if we cannot actually stack an "infinite" number of interpreters in this finite machine! By considering each possible transition as separate entity (instead of trying to group them into relatively few "variations") we are effectively using the "highest resolution" version of a matrix possible for that self-interpreter.

Note that because of the zero-filled columns on the right (corresponding to the transitions in the underlying machine that can't be emulated due of a lack of resources), those columns and the corresponding rows don't contribute to the value of the eigenratio. Therefore the smaller square matrix consisting of only the first N rows and columns, where N is the number of transitions in the emulated FSM, will also produce the correct eigenratio (dominant eigenvalue), even though that smaller matrix also can't be used to calculate the full number of transitions "executed" in the running of any particular program via the interpreter.

If we needed (in a theoretical sense only!) to use the "full" matrix (the one including the extra zero-filled columns) to do something like calculate the exact number of transitions "executed" when interpreting some other program, then the vectors of counts will also have to include a matching number of zero values corresponding to those "impossible" (in the emulated machine) transitions, but everything else works essentially as described in earlier posts.

2. Now consider a reduced version of our BF machine, where instead of unbounded memory it has a finite amount of data cells and also a finite range of possible values in each cell of the data, and also an upper bound of the length of the program that can be run, but still large enough to run some particular self-interpreter. This version of a BF machine is therefore a FSM instead of being Turing complete, and (from 1) the self-interpreter has some fixed eigenratio and corresponding matrix in that machine.

In a finite version of the BF machine, a "state" represents one unique snapshot of the entire data array plus the position of the data pointer. For all of these states (except those representing situations where the data pointer is already at the rightmost data cell), there will be a valid transition for the '>' instruction, taking us to another state where the data pointer has moved to the "right" by one but the contents of memory haven't changed at all, and similarly with all other instructions that are valid at that point. There would need to be multiple transitions from each state for the input instruction, one for each possible value that could be input.

3. Now we show that we can expand the size of this smaller BF machine (by increasing the number of number of memory cells and/or increasing the range of values each cell can hold) without essentially changing the behaviour of the same self-interpreter when it is used in the expanded machine to run any program that was also able run in the smaller machine from step 2.

As we add more memory cells (or increase the range of values each cell can hold) we are adding new states to our machine. Therefore there will also be new transitions (some from the "edges" of the state transition graph (for the smaller FSM) connecting to some new states, and others between various pairs of the newly added states). However, we can consider the states (and transitions between them) from the smaller machine to be unchanged as long as we extend the meaning of each state to include all of the newly added memory cells (or newly added memory bits in each data cell) being zero. (The language, BF, hasn't changed.) Thus the matrix for the self-interpreter in this larger machine can be constructed so that it consists of a copy of the matrix from the smaller machine that is then extended with new columns to the right and new rows at the bottom. Also, at least some of the dummy columns filled with zeroes in step 1 will now be for transitions that are emulated properly in the larger machine and so these will now generally have different values. But importantly, we also know that the elements in the new rows added at the bottom will be zero where they appear in the columns corresponding to the properly emulated transitions in the smaller machine. (The first N columns from that smaller machine matrix, using the same value for N as in part 1.)

Why? Remember that each column essentially describes how one transition in the emulated machine is mapped onto some combination of transitions in the underlying machine. This means the new values added to the bottom of any of those first N columns from the matrix for the smaller machine must correspond to transitions in the underlying machine that were not needed in the emulation of the transition corresponding to that column in the smaller machine, and so won't be needed in this larger machine either.

Now consider a program that could be run in the interpreter in the smaller BF machine. In other words it only used a subset of those N transitions. In this expanded machine, the vector representing how often each transition in the underlying machine is used during execution of the program is the same as what it as in the smaller machine for those first N elements, then extended with zeroes for the additional transitions in the expanded machine that it does not use. The same thing applies to the vector representing the start-up section of the self-interpreter.

Thus when we apply the same matrix operations as described in earlier posts using the expanded matrix and vectors, we get essentially the same results (but also with additional zeroes padding things out in various vectors). Thus the total execution time for any such smaller program when run on top of a fixed size stack of our self-interpreter in the expanded machine must be the same as what it was in the smaller machine, and hence the eigenratio must also be the same, at least when we consider the set of smaller programs that could be run on the smaller BF machine.

Really, all of this is a long winded way of showing that a small program that could run in the interpreter on the smaller machine will also run in the interpreter on the larger machine using exactly the same states and transitions as it did previously which may seem rather obvious anyway!

4. But this also means the same self-interpreter must have the same eigenratio for any valid programs in this larger machine! For if it had a different eigenratio when running one of the larger programs (one that couldn't be run in the smaller machine but can be in the expanded machine) we would then be contradicting the result from step one that says a self-interpreter on some particular FSM has a fixed eigenratio for all programs that can be run in that machine.

So now we see that when a reduced BF machine (but still big enough to run a self-interpreter) is expanded to a larger "size", the same self-interpreter must still have the same eigenratio in the new machine, even though it could now also be running some programs that require a larger amount of memory and/or larger range of values in each data cell.

5. We can repeat this expansion of a smaller BF machine into a larger version, as often as we need and by as much as we need, without changing the eigenratio of any particular self-interpreter that could run on the smaller version. Hence we can make a BF machine arbitrarily "big", and from this conclude that a self-interpreter running on the Turing complete version (with an "infinite" number of data cells and/or "infinite" range of values in each cell) will exhibit the same eigenratio.

When we are looking at a Turing complete version of BF, there are countably infinite states (and transitions) to consider. But we can still (in a theoretical sense at least) construct a kind of matrix that corresponds exactly to a self-interpreter on that machine by allowing each column to extend down forever, and each row to extend to the right forever. Similarly the vector of costs is infinite, and so is the vector of counts corresponding to the execution of any particular program (one that also halts). In other words, exactly like the finite dimensional approach use previously but extended (to the "right" and "down") to infinity. Then using an extended notion of matrix multiplication we can use this infinite matrix and related vectors to calculate exactly the number of state transitions required to run any particular program through the self-interpreter. We'll be summing an infinite sequence of products but there are only a finite number of non-zero values in each case.

Finally (with my tongue firmly in my cheek), if anybody has any doubts about any of this, we will now also have enough memory to actually build an infinite stack of copies of our self-interpreter to test that the theoretical eigenratio (dominant eigenvalue of the corresponding matrix) exactly matches the observed one (time taken for N+1 stacked copies of the self-interpreter to run any other valid halting program, divided by the time taken for N stacked copies to run the same top level program)!

Time to get back to finding that eigenratio...