Posts from September 2008.

Subtle mistake with environments

We have just started and there is already a mistake. I must admit that I had envisioned already that problem in the previous post. The trick is that a MACROLET and a SYMBOL-MACROLET form must take into account outer defitinitions when creating the new definitions. For instance, here

(macrolet ((a () 5))
  (macrolet ((b () (a)))
    (b)))

in order to expand the definition (b), the form (a) is evaluated at compilation time and thus we need the definition of that macro to be present. The appropriate fix has been added together with a new test case.

Data structures, the environment and some tests

The code repository here is beginning to become populated. If you go to the ecl-ssa tree view, and in particular you focus on the tag 2008.09.20, corresponding to this post, you will find already a few files.. One is ssa.lisp and it will be the file responsible for loading the rest of the compiler now and in the near future. I could have chosen to use ASDF, but since this is going to be integrated into the ECL build process, that does not seem like a sensible idea.

The second file is ssa-defs.lisp. This one contains the object definitions for the compiler structures. You may notice that I have currently chosen to use CLOS for these structures. That is for ease of work. At some point we will have to move back to lisp structures, or even something simpler, for most of this code is to run much before the whole of Common Lisp is loaded. As far as we keep things well encapsulated, hide all uses of make-instance and other details of the inheritance, this should pose no problem afterwards.

The third thing you will notice is that there are quite many objects right now.

+ named-object
   + reference
      + variable
         + lisp-variable
            + global-variable
         + lisp-block
         + lisp-tagbody
         + lisp-function
   + lisp-macroexpander
      + lisp-macro
      + lisp-symbol-macro
+ code-block

They split into two families. First we have objects that can be referenced. These objects are created by lisp forms during the translation process, they are stored in an environment structure (a list :-) ) and become accessible to other, inner forms. This is the largest family and here we find local and global variables, labels, blocks, tagbodies, functions and macros. Can you think of any other objects you may need? I would be surprised since those are already more than what the current ECL compiler uses.

Then we have the data structures which are used by the SSA compiler. Right now these only include an object named CODE-BLOCK that groups statements together and which implements the nodes of the code flow graph. Unlike in the previous version of the lisp2c compiler, we will not create a data structure for statements. Right now I find it more convenient to keep the intermediate representation in lisp form, instead of dealing with structures. At least experience in the previous compiler showed that the latter leads to uglier code.

Another minor detail is that we handle names ourselves. We have a named-object class that contains a symbol and a suffix, and which is used to create many versions of the same name. This is useful for creating anonymous variables, environment variables, labels, tagbodies, etc. All in all, objects which not only have EQ identity, but which we also want to be distinguishable when printed. We could have used gensym for that, but that really leads to ugly code, since we cannot control the numbering.

Then we have another file, ssa-env.lisp, which contains all functions for manipulating the compilation environment. We distinguish two sets of functions. First there are those that allow us to enlarge and environment with new objects or to look symbols up. These are the functions used during the translation process to convert symbols into variable references, find out which is the block a RETURN statement belongs to, etc.

Then there are functions which use this environment for macro-expansion. These ones are a bit cumbersome because they have to call MACROEXPAND or EVAL, both functions which expect the environment object to have the old format. Thus you find MAKE-MACROEXPAND-ENVIRONMENT and MAKE-EVAL-ENVIRONMENT, which will tell you a bit more about how environments are handled right now. Please do not be scared about the messy format for the old legacy environments, they are just simple enough for C to process them without much effort :-)

Finally I have prepared some simple tests for the environment. This suite has to be enlarged, for there are many subtle issues that have to be fixed before we move on. For instance, we have to test that a MACROLET definition is processed in an environment that contains all previous MACROLET definitions. And by this I do not mean at macroexpansion time, but that all previous MACROLET definitions apply to the body of the MACROLET definitions as well. Well, lots of things that were not so easy to get right at the beginning with neither the interpreted nor the C compilers. Help is welcome on this point.

What we have, what we want, how to get it

Before getting into the task of coding one should sit down and think clearly not only about the objectives, but also about what we have achieved so far and what are the problems that the current ECL codebase faced and solved.

So, first of all, how does ECL compile and execute code right now? We have a redundant architecutre, with two compilers: one that transforms lisp code into bytecodes and another one that uses C as a backend. The first one is barely a single pass compiler: code is macroexpanded and, once expressed with a subset of Common Lisp, it can be directly translated into a set of bytecodes that match the structure of that language. While extremely simple, the development from scratch of such compiler taught this guy some lessons
More… »

Blog opening

This post opens a blog related to programming, but with a strong focus on a particular project, namely, to write a compiler for Common Lisp that can be used in the implementation I maintain, ECL.

This project was born one week ago, with a post I wrote to the mailing list. As I mentioned there, “one sometimes needs an additional motivation to keep working.” In my case the motivation is learning, not just polishing code, and writing compilers has always been my pet subject.

Before you continue reading any of my posts, beware: I am a selft taught programmer. I have taken no courses on programming from a university, though back in my youth I got a title as Functional Analyst. What does this mean? It means that I will write things that may be obvious to some people, things which are wrong, I will rediscover algorithms that everybody knows about… But if you do not care about this, I will be more than happy to “show you the code” as I write.

Welcome aboard!