The aim of this project was to implement a macro expansion mechanism and make it available to the user.
A macro is a sort of abbreviation that you can define once and then use later.
Macros allow programmers to extend the syntax of a language by adding new language constructs defined in terms of simpler ones.
When the arguments, if any, to a macro call have been collected, the macro is expanded, and the expansion text reread in case another expansion is required. The expansion text from one macro call might result in more macros being called.
A simple example: if A expands to `b', and B expands to `Hello world', the input a
will expand first to `b', and is then reread and expanded into `Hello world'.
Macro processing is implemented in several different languages.
There are three types of macro in the C pre-processor.
A simple macro is a name that represents a fragment of code.
Before a macro can be used it must be defined with `#define'. `#define' is followed by the name of the macro and then the code it is abbreviating.
e.g.
#define FILE_SIZE 1000defines a macro named `FILE_SIZE' as an abbreviation for the text `1000'. If somewhere after this `#define’ there is a C statement of the form:
mem = (char *) xmalloc (FILE_SIZE);then the C pre-processor will recognise and expand the macro `FILE_SIZE', resulting in
mem = (char *) xmalloc (1000);
The use of all upper case for macro names is a standard convention. Programs are easier to read when it is possible to tell at a glance which names are macros.
Normally, a macro definition must be a single line, like all C pre-processing directives.
There is little restriction to what can go in a macro body. The body need not resemble valid C code.
The C preprocessor scans your program sequentially, so macro definitions take effect at the place you write them.
After the preprocessor expands a macro name, the macro's definition body is appended to the front of the remaining input, and the check for macro calls continues. Therefore, the macro body can contain calls to other macros.
A simple macro always stands for exactly the same text, each time it is used. Macros can also accept arguments. Arguments are fragments of code supplied each time the macro is used. These fragments are included in the expansion of the macro according to the directions in the macro definition. A macro that accepts arguments is called a function-like macro because the syntax for using it looks like a function call.
To define a macro that uses arguments, a `#define' directive is used with a list of argument names in parentheses after the name of the macro. The argument names may be any valid C identifiers, separated by commas and optional whitespace.
E.G. a macro that computes the minimum of two numeric values:
#define min(X, Y) ((X) < (Y) ? (X) : (Y))
To use a macro that expects arguments, the name of the macro is written followed by a list of actual arguments in parentheses, separated by commas. The number of actual arguments you give must match the number of arguments the macro expects.
The expansion text of the macro depends on the arguments you use. Each of the argument names of the macro is replaced, throughout the macro definition, with the corresponding actual argument. Using the same macro `min' defined above, `min (1, 2)' expands into
((1) < (2) ? (1) : (2))where `1' has been substituted for `X' and `2' for `Y'.
Likewise, `min (x + 28, *p)' expands into
((x + 28) < (*p) ? (x + 28) : (*p))
After the actual arguments are substituted into the macro body, the entire result is appended to the front of the remaining input, and the check for macro calls continues. Therefore, the actual arguments can contain calls to other macros.
The macro body can also contain calls to other macros. For example, `min (min (a, b), c)' expands into this text:
((((a) < (b) ? (a) : (b))) < (c) ? (((a) < (b) ? (a) : (b))) : (c))
If you use the macro name followed by something other than an open-parenthesis, it is not a call to the macro, and the pre-processor does not change what you have written. Therefore, it is possible for the same name to be a variable or function in your program as well as a macro, and you can choose in each instance whether to refer to the macro (if an actual argument list follows) or the variable or function (if an argument list does not follow).
Such dual use of one name could be confusing and should be avoided except when the name is both a macro and a function and the two have similar effects.
You may not define the same name as both a simple macro and a macro with arguments.
There are standard predefined macros which are available with the same meanings regardless of the machine or operating system being used.
Their names all begin with double underscores.
E.g.
__FILE__ which expands to the name of the current input file
__LINE__ which expands to the current input line number
__DATE__ which expands to a string constant that describes the date on which the pre-processor is being run
In Lisp, a macro definition is a user defined transformation that tells how macro calls can be transformed into simpler forms.
Macro calls in LISP follow the pattern:
(<macro keyword> <argument>*)
(defmacro inc (var) (list 'setq var (list '1+ var))) => inc (macroexpand '(inc r)) => (setq r (1+ r))
A macro call looks just like a function call in that it is a list which starts with the name of the macro. The rest of the elements of the list are the arguments of the macro.
Evaluation of the macro call begins like evaluation of a function call but instead of evaluating the elements of the call list and using the results as arguments, the macro arguments are the actual expressions appearing in the list call. They are not evaluated before they are given to the macro definition. In contrast, the arguments of a function are results of evaluating the elements of the function call list.
Having obtained the arguments, Lisp invokes the macro definition just as a function is invoked. The argument variables of the macro are bound to the argument values from the macro call. The macro body executes and returns its value just as a function body does.
Another difference between macros and functions is that the value returned by the macro body is not the value of the macro call. Instead, it is an alternate expression for computing that value, also known as the expansion of the macro. The Lisp interpreter proceeds to evaluate the expansion as soon as it comes back from the macro.
Because of LISP's simple syntax, and because LISP programs and data can be expressed in the same form, LISP-like languages are easily adapted to the addition of a macro mechanism.
Evaluation of a program involves two logical steps. The program is first converted into an intermediate language using macro-expansion, and then the result of macro expansion is evaluated.
M4 is a text processing language widely available on UNIX. It is a pure macro processor, in the sense that it copies its input to the output, expanding macros as it goes. Macros can be either built in or user-defined, and can take any number of arguments. M4 can be used either as a front-end to a compiler, or as a macro processor in its own right.
| Consequences which can lead to trouble | Ways to avoid them |
|---|---|
| Evaluating Macro Arguments Repeatedly | The expansion should evaluate each macro argument once. |
| Local Variables in Macro Expansions | Local variable bindings in the expansion require special care |
| Evaluating Macro Arguments in Expansion | Don't evaluate them; put them in the expansion. |
| How Many Times is the Macro Expanded? | Avoid depending on how many times expansion is done |
This project has involved extending the set of expressions with iteration, extending the base language with assignment and sequencing and developing a macro expansion mechanism to be made available to the user.
When extending the set of expressions and the base language we have tried to use existing classes where possible rather than creating new ones. Implementing these has involved adding new tokens to the lexer TLL.lex and defining actions for them in the parser TLL.yacc.
Iteration can be implemented by using the existing ‘rec’ and ‘if’ in the parser.
While, for and do loops can all be implemented in a similar way as they all basically do the same thing.
Proposed Syntax: (while condition body)
Where condition is a term and body is a sequence of terms.
The following demonstrates how a while loop could be constucted:
(def while (rec loop (c d) ( if c (d loop (d)))))where c is the condition and d is the body.
However this will not work with the current implementation of the TLL parser because a condition can not be passed and the if statement requires you to supply an alternative.
This functionality can be added to the language by changing the parser.
The following shows an implementation that uses a simple counter to show how the concept would work:
TLL? (def while (rec gloop (c d) ( if (= c 10) (+ d 105) ( gloop (+ c 1) (+ d 1)))))
result:
{Closure: (c d ) (if (= c 10 )(+ d 105 )(gloop (+ c 1 )(+ d 1 )))<Env: gloop <Env: = <Env: / <Env: *
<Env: - <Env: + <Env: while {env_nil}>>>>>>>}
This will count from c to 10, adding 1 to d in each loop. Finally 105 will be added to d.
For example:
TLL? (while 1 10) result: 124Tests the while from 1 to 10 going through nine iterations.
Proposed Syntax: (for start end increment body)
Where start, end, and increment are terms and the body is a sequence.
The following demonstrates how a for loop could be implemented:
(def for (rec fl (c i d) ( if c (+ d 100) (fl c i (+ d i)))))
c is the start condition, I is the increment and d is the body.
The following shows an implementation that uses a simple counter to show how the concept would work:
TLL? (def for (rec fl (c e i d) ( if (= c e) (+ d 0) ( fl (+ c i) e i (+ d 1)))))
result:
{Closure: (c e i d ) (if (= c e )(+ d 0 )(fl (+ c i )e i (+ d 1 )))<Env: fl<Env: = <Env: / <Env: *
<Env: - <Env: + <Env: for {env_nil}>>>>>>>}
e is the end conditionTLL? (for 0 10 2 0) result: 5
Syntax: (assig a b)
Where a is a variable and b is a string, integer or variable.
Added a token for assignment to the lexer: TOK_ASS.
Definitions were added to the parser, TLL.yacc. At first we used the syntax (a := b) but to be more consistent with TLL syntax, this was changed to (assig a b).
This was implemented using a let binding, with a as the first parameter, b as the second, and 0 as the body.
There were three options for assignment:
We realised that the let binding would not be suitable, as the variable would only be assigned the value in the body of the let, which was empty.
The definition was changed to use def instead. The variable to be assigned to is the first parameter, the value is the second. The variable will now hold that value for the duration of the program, unless it is reassigned.
Assignment was added to the list of things a term could be.
Syntax: (seq term term …… term)
Sequencing uses a similar method to that used by recursive functions, but instead of writing a new class, uses the existing let binding class.
Sequencing is useful as it allows terms to be performed in sequence in the body of other functions. It acts like the {} in C and begin – end in Pascal and allows for larger programs to be written more easily.
Added a token for sequencing to the lexer: TOK_SEQ
Added definition to the parser, TLL.yacc, for the sequencing. Decided to implement it in a similar way to the recursive function. The word sequencing was added to the list of possible things a term could be.
Sequencing was defined to be TOK_OPEN_PAREN TOK_SEQ sequence TOK_CLOSE_PAREN
Sequence was defined to be a term or a sequence followed by a term. This means that it is recursive, reducing itself until a single term remains.
The sequence is implemented using let binding. Each term in the sequence is assigned to a dummy variable in the let, with the body being the next let in the sequence. The final let will have the dummy variable as its body.
So writing
(seq term1 term2 term3 term4 term5)would be equivalent to writing
(let v5 term5 (let v4 term4 (let v3 term3 (let v2 term2 (let v1 term1 v1)))))
When this was first tested it was thought that the terms were being evaluated in the wrong order. On attempting to rectify this, it did not seem possible. However with further testing, and really thinking about what was going on revealed that the terms had been evaluating in the correct order all along.
As the recursion takes terms off the end of the sequence, the first term encountered will be the last term of the sequence. This is required to ensure that the terms are evaluated in the correct order. In the above let bindings, the first term to be evaluated is the last one, term1.
To ensure that the parser compiles, all the parameters of the let binding need to be of type TLL_term*. The terms and the sequences are of this type, but the dummy variable was a problem. A new variable of type TLL_var* was created and initialised to a string such as "v1". As TLL_var is a subclass of TLL_term, the variable could be used in the let binding as the dummy variable.
The process of macro expansion is a useful way of removing duplicate blocks of code from a program, and defining it at a single point. This differs from a function definition primarily because the expansion, in a compiled language, is done at compile time, rather than runtime, as with a function call. It is achieved by pure text substitution before the code itself is compiled.
Now when compiling a C program, for example, the code is run through a pre-processor, which looks for directives, such as #define and #if, among others. The one that concerns us with macro expansion is #define.
As an example in C, we could define the following macro:
#define max(a,b) (a>b?a:b)
When the pre-processor sees a macro call to max, it replaces the entire call, from the opening '#' to the closing ')' with the expansion, substituting the parameters labelled in the definition with the arguments given in the call.
The advantage of this over an ordinary function call is that because the code is generated inline, there are no instruction pointer jumps to another part of the program, and no stack operation are involved in passing parameters, both of which will reduce program execution.
The disadvantage is if a macro is called recursively from with another macro, which generates some sort of loop, the length of which cannot be determined at compile time, then it is not possible to generate correct or valid code. This should be checked for and prevented.
The way macros have now been implemented in TLL is very similar to GCC, insofar that there is now a pre-processor, called "tllpp". This takes input from stdin, writes the output to stdout, making the relevant macro substitutions, which can then be piped into the interpreter.
The TLL macro definitions are also made in a similar way to C. For example:
define add(a,b) (+ a b)and it is called with a '#' to label the beginning of the call:
(let z #add(a,b) ... )
There was a certain amount of difficulty in determining whether or not a given word was a macro call, or anything else, which is why each call is preceded by '#'.
The pre-processor works in two passes. First, lines containing macro definitions are discarded, since they are not part of the TLL languages, and the expansions and parameter list are saved into an element of an array of structures. On the second pass, strings preceded by the hash symbol are compared against the list of macro definitions until a match is found. Then the parameters are parsed, substituted into the expansion, which then replaces the entire macro call. Lines which contain no definitions or calls are output normally.
The source code can be found here:
tllpp takes input from stdin and puts output to stdout so pipes can be used to run it without creating an intermediate file.
E.g. cat file|tllpp|byvalue
This is an example of a macro for factorial:
define fact2(x) ((let fact (rec f (n) (if (= n 0) 1 (* n (f (- n 1 ))))) (fact x ))) #fact2(9)
The following shows the result of the macro expansion of the above:
(let fact (rec f (n) (if (= n 0) 1 (* n (f (- n 1 ))))) (fact 9 )) end
Something else, which was easy to implement in the pre-processor, was the usage of comments. TLL did not have any implementation of comments, but the pre-processor will remove any lines which begin with '//'.
Haven't supported macros similar to
#define FILE_SIZE 1000in C because it may cause problems in determining whether the macro is an argument macro or a name macro.
If we had more time we would:
Previous Page