reorganize the code
on how cleanup drives further development
git: commit
When I started to clean up the code my approximate idea was to get rid of repetitive patterns in lispm.c and separate various parts of the VM. I didn’t anticipate any significant semantic changes.
I started by roughly splitting the code into several files. Then I
replaced repetitive pattern of list iteration with FOR_EACH
construct:
FOR_EACH(elem, sequence) { do_something(elem); }
While I was doing it, I ripped the benefit of reduced syntax noise and the fact that I once again iterated over significant portion of the code.
I noticed several places with C(C(name, val), next)
pattern, and that
itched – I felt that such objects could be better represented by
triplets T(name, val, next)
, except I used such triplets to store
properties of lambda. But the revelation followed: it is always possible
to distinguish the purpose of the triplet by the context. A lambda would
always be denoted ('LAMBDA lambda_triplet)
in the evaluation phase,
and I can use ('COND branches_triplet)
without confusing them during
evaluation. Furthermore, such triplets can also have the same linked
list structure as the CONS objects. This latter revelation is important
for two reasons.
One reason is that I use the pattern “build linked list, then reverse it
in place” on multiple occasions. For example, while iterating through
branches of conditionals we do T(cond, expr, previous_conditionals)
,
but as the very last step we reverse this list, because during
evaluation we want to consider conditionals in the order of their
appearance in the text.
The second reason of its importance has to do with the garbage collector. It is currently the only piece of lispm.c that goes into deep recursion: while performing its duty, it essentially rebuilds the list, originating at the current root of the stack 1, and does it by recursively calling itself for every element of the list. This of course gives a high risk of overflowing the runtime stack. Eliminating this problem not only for CONS-lists but also for things like triplet-lists of conditionals requires some common approach on representing these recursive objects.
Guided by these considerations I made the unthinkable: I changed the
representation of CONS :-) More specifically, I decreed that all
recursive structures will have the pointer to the next element at offset
zero at their stack location. So it is now CDR CAR
on the stack
instead of CAR CDR
. This convention allowed me to use exactly one
argument in list_reverse_inplace
function.
I also introduced the (apply)
special form, because I noticed several
things:
- At some stage of my refactoring the post-semantic-analysis program
would look like
(FORM arg)
, whereFORM
is one of the builtin syntactic forms (cond, quote, lambda,…). The only exception to this rule was the function application, and it lead to overly complex body ofeval
function. - I checked in
evapply()
whether builtin was a special form or not. With the new structure of the program, I know that it is never a special form inside(apply)
because every special form is represented by the pair listed in (1).
I deliberately made (apply)
not a valid literal such that a user
cannot invoke it directly, because I don’t want to accidentally
introduce late binding:
(let ((parms '(a 3))
(apply cons parms)) ; is `a` in the environment?
; what if list parms came from a completely
; different part of the program?
Having (apply)
as a special form, that can only be invoked internally
by the VM itself provides me enough control over what can be passed as
parms
list.
While doing all that, I noticed a bug in my implementation of letrec
:
(let ((c 3)) (letrec ((c c)) c)) ; evaluates to 3
(letrec ((c c)) c) ; unbound variable
The latter behavior is correct – a variable cannot be recursively
defined as itself 2, but I accidentally allowed it in the former
scenario. The fix was rather straightforward: before entering the
lamdba, created by the letrec
definition, I must mark all lambda
argument names unbound. This will not pose a problem for recursive
functions, because I don’t evaluate their bodies when binding them to
the names, but it will terminate the VM in both of the examples above.
Having done all of the above, I rearranged the code, renamed some types
(most notably Sym
is now known as LispmObj
) and made a pass through
the comments. One part that I don’t really like is the non-standard
dependency on linker for builtins, but I could not come up with an
alternative that would make me happier.
verbose branch logs
-
[fff6eea8] step 1: chunk API into logical blocks
sidenote: forcing lispm_cons_alloc always_inline blows up the size from 4044 to 4500+ bytes
-
[ea479a1e] step 2: reorganize builtins
I now allocate the top of the stack to hold the symbols for builtins.
I will likely later make it once again a property of
Lispm
object, and remove the GNU extensions and linker script. -
[a69666ab] step 3: further separate parts of the VM
lispm.c: core syntax of the language lispm-funs.c: core functions lrt0.c: further memory management functions
-
[787157f6] step 3: finalize the builtins
I gave up on the idea of making builtins independent of linker, as it is hard to then identify builtin within an extension module.
-
[9b24d706] step 4: introduce FOR_EACH to iterate over lists
-
[d9d9ab7c] step 5: introduce triplets
Reuse lambda stack word internally as a triplet object. In fact, the semantics of the triplet is always derived from the context, so I can even make it a part of public API.
-
[fa593206] step 6: introduce triplets, quads, and pentas
I noticed several
C(C(a, b), c)
patterns inlispm.c
which made me think. In the end I realize thatlambda
is not expressed by the triplet itself, but rather by the pairC(SYM_LAMBDA, triplet)
, and the triplet can be either prototype or closure. Nothing then prevents me from reusing the triplets for other purposes as well. -
[e2eac8fd] step 7: prep to change layout of stack objects
-
[cd771127] step 8: put “next” field at the head of stack obj
This simplifies
list_reverse_inplace()
and anticipated changes to GC. -
[34ccbf85] step 9: introduce ‘(apply)’ special form
This makes the program after semantic stage look very uniform: it is either an atom, or a pair
(form args)
, whereform
is a special form, and args are its parameters.In case of the
(apply)
form specifically the first argument can either be a builtin function, or a lambda, it can never be a special form itself, which simplifies the logic of evapply drastically. -
[e11b88f7] step 10: patch a bug in letrec
Previously
(let ((c 3)) (letrec ((c c)) c))
returned 3, because(c c)
bound the value to the one in the containing frame. This is wrong behavior, because expressions on the right hand side of assignments inletrec
should already refer to the objects defined inletrec
. In this particular example, the evaluation should cause an error (either infinite recursion, or undefined symbol).This can be achieved by unbinding all the
letrec
names before passing them into the implementation lambda. -
[33a6e50b] step 11: provide some structure to the directory
liblispm/ the source code tests/ tests build/ target directory for builds
Builds are now coming in four fashions: 00 01 10 11. 00 is non verbose, non assertive, optimized for size 11 is the opposite
-
[e661713e] step 12: massively rename types and functions