Assignment 7v3, CSC430, Winter 2006

February 16, 2006

1  Assignment

In this week's assignment, you will construct a parser for the MALGOL language, using the LR parser generator built into drscheme. Then, you will tie together your lexer, your parser, and your solution to assignment 4 to produce a working evaluator that reads a file from disk and evaluates it.

2  Modifications to the Lexer

In order to use the built-in parser generator, you will need to make some superficial changes to the lexer. In particular, the parser generator only works for tokens declared with its define-tokens and define-empty-tokens forms. The first defines tokens with associated information, like our string literals and identifiers. The second defines tokens like left-paren and begin (that is, keywords) that have no associated information.

After defining a group of tokens with, say, define-tokens, you can use the procedure token-<name-of-token> to construct the given token.

Here's an example:

;; import the parser library:
(require (lib "lex.ss" "parser-tools")
         (lib "yacc.ss" "parser-tools"))

;; Let's declare some tokens
(define-tokens my-tokens (ID STRING))

;; let's make an ID token:
(token-ID "my-identifier")

;; let's compare two of them:
(equal? (token-ID "my-id") (token-ID "my-id"))

Note that the choice of all-caps letters for token names makes the grammar rules (you'll write them later in this assignment) easier to read. In essence, we're capitalizing the terminals of the grammar.

Mercifully, the resulting structures may be compared with equal?, as the example shows. This means that you can amend your test cases to work with the new tokens. Please do so.

Also, you will have to add one new identifier-like keywords to your lexer, the keyword "in".

If you have any questions about the meaning of define-tokens or define-empty-tokens, please read the documentation.

Make sure this part is working before going on to the next part.

3  Parsing

Next, formulate a parser for the MALGOL language, using the parser-tools library. The grammar for the parser is defined as follows:

program ::= procedure ... expression
procedure ::= PROCEDURE name params ; funtype BEGIN local-vardef ... IN expression END
params ::= name
         | < params , params , ... >
funtype ::= < type -- type >
type ::= name
       | < type , type , ... >
       | funtype
expression ::= expression expression ...
             | name
             | < expression , expression , ... >
             | NUMBER
             | STRING
             | TRUE
             | FALSE
             | IF expression THEN expression ELSE expression ;
             | (expression)
             | BEGIN local-vardef ... IN expression END
local-vardef ::= type params = expression ;
               | procedure ;
name ::= ID

There are several things to be aware of when reading this grammar.

  1. In this assignment, I use the words `procedure' and `function' interchangeably. The language keyword used is `procedure', but the result of evaluating a `procedure' form may be called either a `procedure' or a `function.'

  2. All of the capitalized names refer to literal tokens, and all of the lower-case names refer to nonterminals defined in the grammar.

  3. The symbols (such as the semicolon, the equals sign, the comma, the paren, etc.) are literal tokens.

  4. Ellipses (that is, ``...'') are used to mean ``zero or more of the preceding item.'' This meaning is stretched somewhat by their use in comma-separated lists, where they are taken to mean not ``zero or more commas'', but rather ``zero or more items in a comma-separated list.''

  5. The first expression rule may be confusing. This rule describes the language form for application. That is, a function may be applied to an argument just by placing them next to each other. For instance, the expression "add1 3" would evaluate to 4, if the add1 function were defined as below.

  6. Application must be left-associative. That is, the expression "f x y z" is parsed in the same way as "(((f x) y) z)". To put it another way, the result of evaluating this function is achieved by applying the function f to the argument x. The result of this must be a function, which is applied to the argument y. The result of this must be a function, which is applied to z.

  7. The begin form includes a set of local definitions, including both simple variables and procedure declarations. These should be parsed as nested where clauses. So, for instance, an expression like begin int x = 3; int y = 4; in + <x,y> end would be parsed as a where clause with a binding for x whose body is a where clause with a binding for y whose body is an application of plus.

  8. Parenthesess are used for simple grouping, not (as in Scheme) to signal an application. If you wanted to apply the function f to the result of applying the function g to 3, for instance, you would have to write f (g 3), because application is left-associative. Note that you are free to sprinkle parens around any expression. In particular, you could also write this expression as (((f ((g) (3))))), and it would mean the same thing.

You must produce the function malgol-parser, which has the following contract, header, and body:

;; malgol-parser : ( -> token) -> (cons (listof fundef) (cons Exp null))
(define malgol-parser
    (parser
...
))

Your job is to fill in the (parser ...) part. That is, use the parser form in the parser-tools library to define the parser. The output of the parser must be a list containing two items: first, a list of fundefs, and second, a single Exp. These refer to the data definitions we used in Assignment 4. I will repeat those data definitions here:

  ;; An Exp is one of : 

  ;; (simple literals:)
  ;; - a number
  ;; - a boolean
  ;; - a string

  ;; (variable references: )
  ;; - a symbol

  ;; (primitives: )
  ;; - (make-prim Schemefun)
  (define-struct prim (impl))

  ;; (tuples: )
  ;; - (make-tuple (listof Exp))
  (define-struct tuple (exps))

  ;; (variable binding: )
  ;; - (make-where exp Pattern Exp)
  (define-struct where (body pat rhs))

  ;; (function definition: )
  ;; - (make-fundef (symbol Pattern Exp))
  (define-struct fundef (name params body))

  ;; (function application: )
  ;; - (make-funcall (Exp Exp))
  (define-struct funcall (fun args))

  ;; - (make-if-s exp exp exp)
  (define-struct if-s (test then else))
  
  ;; a Pattern is one of 
  ;; - a symbol
  ;; - (listof Pattern)

You may notice that this data definition makes no mention of types. For this assignment, the parser should parse the types ...and then throw them away. That is, the types don't make it into the parsed AST. Of course, they must still be well-formed for your test cases to work. I suggest the use of the identifier any to indicate the most general possible type.

Also, no parsed program will include instances of the prim expression. That's all right, your programs can use the primitives defined in the top-level-env. In order to work with our new language, you'll need to make a few minor changes to the top-level-env. Section 4 describes these changes in more detail.

3.1  Using the Parser Generator

You will need to refer to the documentation for the parser collection. The most direct way to access this documentation is to choose Help>Help Desk, click on the link labeled ``Manuals'' (it's there, keep looking), and then on the link labeled ``Parser-tools collection''.

In order to import the parser-tools libraries, include this near the beginning of your file:

(require (lib "lex.ss" "parser-tools")
         (lib "yacc.ss" "parser-tools"))

Also, you will likely be lost until you take a look at the examples provided with the collection. You can find them inside the PLT folder, at collects/parser-tools/examples. I suggest taking a look at the ``calc.ss'' file and running it to see how it works.

Much of the grammar provided above will translate line-by-line into rules of the parser you write. The challenge will come in the rules that include ellipses, since the parser does not provide a ... feature. Instead, you will need to define rules (much like you did on assignment 5) to define exactly how these sequences are to be parsed. The other tricky rule has to do with parsing the expression expression ... form. You will almost certainly need to add an extra non-terminal in order to get things to parse cleanly. Also, keep in mind that application is to be parsed in a left-associative manner.

I recommend using the (debug "filename") feature of the parser-generator to understand what's going on, particularly when you run into conflicts. The tables generated by the debug form are like the ones we used in class. They define a set of states, and indicate what transition to make when a given token appears.

Start with simple grammars, and build up to more complex ones. As always, write your test cases first. If you can't write the test case, you can't write the program.

3.2  Examples

All right, here are some examples.

Example 1

procedure factorial x ; <any -- any>
  begin in
    if equal <x,0> then
      1
    else
      * <x,factorial (- <x,1>)>
    ;
  end

factorial 6

The result of parsing this should be:

(list (list (make-fundef 'factorial 'x
                         (make-if-s (make-funcall 'equal
                                                  (make-tuple (list 'x 0)))
                                    1
                                    (make-funcall '*
                                                  (make-tuple
                                                   (list 'x
                                                         (make-funcall
                                                          'factorial
                                                          (make-funcall
                                                           '-
                                                           (make-tuple (list 'x 1))))))))))
      (make-funcall 'factorial 6))

Example 2

Here's an example featuring local variable bindings:

procedure fordax <y,z> ; <<any,<int,any>> -- <any,any>>
  begin
    <int,int> z = <3,4>;
    procedure inner z ; <<int,any> -- <any,any>> begin in + <z,3> end;
    any q = 14;
  in
    z q
  end

fordax <brug,zorb>

The result of parsing this should be

(list 
 (list 
  (make-fundef
   'fordax
   '(y z)
   (make-where
    (make-where (make-where (make-funcall 'z 'q) 'q 14) 
                'inner 
                (make-fundef 'inner 'z (make-funcall '+ (make-tuple (list 'z 3)))))
    'z
    (make-tuple (list 3 4)))))
 (make-funcall 'fordax (make-tuple (list 'brug 'zorb))))

4  Evaluation

In order to link the parser with the evaluator, you will need to make two small changes. First, you will need to alter the set of primitives in the top-level-environment slightly. Second, you will need to add a new whole-program evaluator that accepts a list containing a list of procedures and a single expression, and returns a Value.

4.1  Changes to the Top-level Environment

Only two changes are necessary. First, the name of of the equality-testing primitive must be changed from equal? to equal, because our lexer does not support the use of question marks in identifiers. Second, you should add bindings for -, lt (less-than), and gt (greater-than). The - operator should subtract when given a two-tuple of numbers, and signal an error otherwise. The lt and gt primitives should produce a boolean when given two numbers, and signal an error otherwise.

4.2  Adding a program evaluator

Since our language is intended to allow mutually recursive procedures at the top level, we need some way to construct a top-level environment that contains a closure for each of the procedures. Moreover, each of these closures must contain a reference to this same top-level environment. The trick we used before, of adding the recursive binding to the environment at the last minute (that is, when the procedure is called) is not convenient here. This means that we must construct a circular environment.

The easiest way to do this is to build a list of closures with bogus environments, use these to build the top-level environment, and then go and mutate each of the structures to refer to the environment we just constructed. Here's how I did it:

  ;; prog-eval : (list/c (listof fundef) Exp) -> Value
  (define (prog-eval input)
    (let* ([procs (car input)]
           [body (cadr input)]
           [will-be-closures (map (lambda (x) (make-closure x null)) procs)]
           [new-env (fold env-extend top-level-env (map fundef-name procs) will-be-closures)])
      (for-each (lambda (clo) (set-closure-env! clo new-env)) will-be-closures)
      (p-eval body new-env)))

You don't need to do it in exactly this way -- you'll need to figure out how to define or import fold -- but you will probably want to use the set-closure-env! procedure, which changes the environment associated with a closure. This procedure is another of the ones that is automatically generated by the define-struct. So, for instance, if you write

(define-struct foo (bar baz))

... you can now use the procedures make-foo, foo-bar, foo-baz, foo? ...and also the functions set-foo-bar! and set-foo-baz!, which change the bar and baz associated with a given foo. It's true, I didn't tell you about these functions before. Mutating structures will get you into a lot of trouble, and your life is simpler when you don't do it. Nevertheless, there are times (like this) when it's the simplest way to get the job done.

Finally, for those of you with non-working assignment 4 evaluators, you may use the one that I've provided as a solution. To find it, follow the link from the Assignments page that allows you to check on the status of your submissions.

As always, you must provide test cases for all of your functions.

5  Tying it all together

You must write the function malgol-file-evaluator, with the following contract and header:

;; malgol-file-evaluator : string -> Value
(define (malgol-file-evaluator filename)
  ...)

This function takes its argument (a filename, represented as a string), creates a lexer for it, parses it, then passes it to the evaluator. It's a very simple function. Don't write or test it until the other pieces are all working.

6  Due Date

This assignment is due at 12:00 PM (noon) on Monday, February 27th. It will be worth 9 points, rather than 6. Get started soon.

Last modified: Monday, March 20th, 2006 11:22:15am