Another important couple of phases have been implemented. These implement the creation of the code flow graph and several stages of simplification of this graph.
The call flow graph is a graph that describes the different execution paths of a program. There are many possibilities for encoding the CFG. A very low level form would be based on linked lists, with one node per statement. This seems very flexible and consistent, but in my opinion it introduces far too many connections, many more than are actually needed. It makes discovery of code units not so obvious.
My current choice is to collect statements into blocks and build a graph of the connections among these blocks. Each connection will amount to a jump statement, which can have one, two or more destinations, that can be unconditional or dependent on a variable. This information —the destination and the type of jump—, will be encoded in the block itself.
Let us consider the following piece of code as an example:
;;; (LET ((X 1)) ;;; (TAGBODY ;;; BEGIN ;;; (LET ((X 1)) ;;; (DECLARE (SPECIAL X))) ;;; (GO BEGIN)))
The current algorithm is driven by the function create-call-flow-graph, located at ssa-blocks.lisp. The first step in split-blocks is to take the list of SL statements and group those statements into block. A block is defined to be the statements that lay between two labels, or between a label and a jump statement, the latter included. At this step we make sure that every block has a name, which is either the label it started with, or a newly created label. After this phase we will obtain the following list of blocks:
(CODE-BLOCK [:L2-2] NIL NIL) (JMP :L1-1) (CODE-BLOCK [:L3-3] NIL NIL) (CODE-BLOCK [:L1-1] NIL NIL) (CODE-BLOCK [:BEGIN] NIL NIL) (GLOBAL-BIND X 1) (UNBIND-GLOBAL X) (JMP :BEGIN)
The notation should be obvious. The beginning of a block is marked by a list containing the symbol CODE-BLOCK, followed by the name of the block, and the incoming and outgoing connections (none at this stage). After this list we find the statements that actually belong to the block.
Only after locating the blocks, we find out their connections, in the function discover-connections. Each block has two kind of connections: incoming, from other blocks, and outgoing. The outgoing connections can be of four kinds: implicit, to the blocks that simply followed us; explicit unconditional jumps; explicit JMP-IF-NOT jumps and explicit JMP-NTH jump tables generated for TAGBODY.
;;; Connecting NIL and [:L2-2] ;;; Connecting [:L2-2] and [:L1-1] ;;; Connecting NIL and [:L1-1] ;;; Connecting [:L1-1] and [:BEGIN] ;;; Connecting [:BEGIN] and [:BEGIN] (CODE-BLOCK [:L2-2] NIL ([:L1-1])) (CODE-BLOCK [:L3-3] NIL NIL) (CODE-BLOCK [:L1-1] ([:L2-2]) ([:BEGIN])) (CODE-BLOCK [:BEGIN] ([:BEGIN] [:L1-1]) ([:BEGIN])) (GLOBAL-BIND X 1) (UNBIND-GLOBAL X)
Note that the connections have been established and that there are many empty blocks which only connect blocks among each other, having no statements. We can get rid of them in the phase eliminate-empty-blocks, taking care, of course, that the structure of the graph is preserved. In the previous case this all simplifies to a single block.
;;; Function name: ANONYMOUS-1 ;;; Required arguments: 1 ;;; Needs closure: NIL (CODE-BLOCK [:BEGIN] ([:BEGIN]) ([:BEGIN])) (GLOBAL-BIND X 1) (UNBIND-GLOBAL X)
So far we have advanced a lot. It now seems reasonable to go back into testing mode, adding checks for the generated and optimized code. The most sensible way, in my humble opinion, is to write a flexible interpreter that understands and executes our intermediate language, either in the form of lists or in the CFG form.