The code repository here is beginning to become populated. If you go to the ecl-ssa tree view, and in particular you focus on the tag 2008.09.20, corresponding to this post, you will find already a few files.. One is ssa.lisp and it will be the file responsible for loading the rest of the compiler now and in the near future. I could have chosen to use ASDF, but since this is going to be integrated into the ECL build process, that does not seem like a sensible idea.
The second file is ssa-defs.lisp. This one contains the object definitions for the compiler structures. You may notice that I have currently chosen to use CLOS for these structures. That is for ease of work. At some point we will have to move back to lisp structures, or even something simpler, for most of this code is to run much before the whole of Common Lisp is loaded. As far as we keep things well encapsulated, hide all uses of make-instance and other details of the inheritance, this should pose no problem afterwards.
The third thing you will notice is that there are quite many objects right now.
+ named-object
+ reference
+ variable
+ lisp-variable
+ global-variable
+ lisp-block
+ lisp-tagbody
+ lisp-function
+ lisp-macroexpander
+ lisp-macro
+ lisp-symbol-macro
+ code-block
They split into two families. First we have objects that can be referenced. These objects are created by lisp forms during the translation process, they are stored in an environment structure (a list
) and become accessible to other, inner forms. This is the largest family and here we find local and global variables, labels, blocks, tagbodies, functions and macros. Can you think of any other objects you may need? I would be surprised since those are already more than what the current ECL compiler uses.
Then we have the data structures which are used by the SSA compiler. Right now these only include an object named CODE-BLOCK that groups statements together and which implements the nodes of the code flow graph. Unlike in the previous version of the lisp2c compiler, we will not create a data structure for statements. Right now I find it more convenient to keep the intermediate representation in lisp form, instead of dealing with structures. At least experience in the previous compiler showed that the latter leads to uglier code.
Another minor detail is that we handle names ourselves. We have a named-object class that contains a symbol and a suffix, and which is used to create many versions of the same name. This is useful for creating anonymous variables, environment variables, labels, tagbodies, etc. All in all, objects which not only have EQ identity, but which we also want to be distinguishable when printed. We could have used gensym for that, but that really leads to ugly code, since we cannot control the numbering.
Then we have another file, ssa-env.lisp, which contains all functions for manipulating the compilation environment. We distinguish two sets of functions. First there are those that allow us to enlarge and environment with new objects or to look symbols up. These are the functions used during the translation process to convert symbols into variable references, find out which is the block a RETURN statement belongs to, etc.
Then there are functions which use this environment for macro-expansion. These ones are a bit cumbersome because they have to call MACROEXPAND or EVAL, both functions which expect the environment object to have the old format. Thus you find MAKE-MACROEXPAND-ENVIRONMENT and MAKE-EVAL-ENVIRONMENT, which will tell you a bit more about how environments are handled right now. Please do not be scared about the messy format for the old legacy environments, they are just simple enough for C to process them without much effort
Finally I have prepared some simple tests for the environment. This suite has to be enlarged, for there are many subtle issues that have to be fixed before we move on. For instance, we have to test that a MACROLET definition is processed in an environment that contains all previous MACROLET definitions. And by this I do not mean at macroexpansion time, but that all previous MACROLET definitions apply to the body of the MACROLET definitions as well. Well, lots of things that were not so easy to get right at the beginning with neither the interpreted nor the C compilers. Help is welcome on this point.