CSC 530 Lecture Notes Week 4

CSC 530 Lecture Notes Week 4
Introduction to the Formal Semantics of Programming Languages
Introduction to Attribute Grammars



  1. Reading: Papers 10 (Knuth) and 11 (Bochman)

  2. What is Semantics?

    1. The meaning of a program.
    2. Very broadly, semantics can be expressed in one of two fundamental forms:
      1. In terms how a program behaves -- i.e., how the program runs on some concrete or virtual machine. (Philosophically, the behaviorist says that a program "is what it does".)
      2. In terms of what a program denotes -- i.e., the program can denote some formal mathematical function or algebraic formula, the meaning of which we already understand. (Philosophically, the denotationalist says that a program "is what it is".)
    3. Semantics can also be defined in terms of what it is not. Viz., in is not syntax.
      1. Syntax expresses the structure of a program -- what it looks like.
      2. Semantics expresses the meaning of a program -- what it computes, including constraints on how the computation is performed.
    4. For efficiency reasons, semantic evaluation of programs is often subdivided into two phases:
      1. Static semantics (i.e., type checking), which defines semantic constraints that can be evaluated prior to program execution.
      2. Dynamic semantics (i.e., program execution), which defines the meaning of execution itself.

  3. How to Specify Semantics?
    1. Informal approaches:
      1. Free-form English (e.g., a paper in the literature, a text book)
      2. Formalized English (ibid.)
      3. As (the output of) an informally developed compiler
    2. Formal approaches
      1. Attribute Grammars (Knuth)
      2. Denotational (Scott and Strachey)
      3. Axiomatic (Hoare)
      4. Algebraic (Goguen)
      5. Operational (you all)

  4. Why Formal Semantics?
    1. It provides a notation for systematic, machine-independent, rigorous language design (quite successful here).
      1. If the notation is good enough, it is a "BNF" for semantics.
      2. I.e., a good formal semantic notation can do for the definition of semantics what BNF has done for the definition of syntax -- made it formal and concise.
    2. It provides a formal definition for the purposes of translator implementation -- i.e., a contract between the language designers and implementors (reasonably successful here).
    3. It provides a basis for formal program verification (the only way to do this).
    4. It provides a formal reference for programmers (a notorious failure here).

  5. Common features of all formal semantic definition techniques
    1. Notational power and complexity (especially the denotational and algebraic styles).
    2. Syntax-directed.
    3. A semantic domain, which consists primarily of an environment and a store.
    4. Semantic "bootstrapping"
      1. All of the formal semantic techniques we will discuss start with a grammar definition.
      2. For an operational semantics, meaning is defined in terms of abstract behavior, which in turn requires the definition of some form of abstract interpreter.
      3. For a denotational semantics, meaning is defined using abstract mathematics. No interpreter is needed here, rather we use the mathematical machinery of formal logic.
      4. The bottom line is that we can only define meaning in terms of what we already understand (e.g., an abstract interpreter or known formalisms of mathematics).
      5. Further, how well we accept such definitions is to some extent a matter of trust in the underlying formalisms.
      6. In this sense, a formal mathematic semantics is probably more trustworthy than a compiler/interpreter written in C.

  6. The role of functional programming languages in defining formal semantics.
    1. To the extent that a functional programming language is a mathematical means of expression, it can be used to express formal semantics.
    2. As we shall see, concepts and notations from functional programming are used extensively in formal semantic definitions.

  7. A brief overview of the major semantic definition techniques
    -- how meaning is expressed in each
    1. For each technique we consider:
      1. Language Semantics -- how the semantics of a full programming language are expressed
      2. Program Semantics -- how the semantics of a particular program are expressed
      3. Orientation -- to what uses in practice has the technique been put
    2. Attribute grammars
      1. The language semantics are expressed as:
        1. CFG (typically in BNF)
        2. a set of attributes
        3. one or more attribute equations associated with each production of the grammar.
      2. The program semantics are:
        1. A set of attribute values associated with each node of a parse tree, the topmost value set being the meaning of the entire program.
        2. The attribute values are obtained by some well-defined evaluation process involving a traversal of the tree.
      3. Orientation -- compiler writing.

    3. Denotational
      1. Language semantics are expressed as:
        1. a CFG (typically in a terse abstract syntax BNF notation)
        2. One or more semantic domains (more formal description of attribute value sets).
        3. One or more semantic functions that map from syntactic forms (non-terminals) into elements of the semantic domains.
      2. Program semantics are a mathematical object (number, truth value, set, function) resulting from the evaluation of the semantic functions
      3. Orientation -- language design.
    4. Axiomatic
      1. Language Semantics are:
        1. CFG
        2. an axiomatic system composed of built-in axioms and rules of inference (typically based on FOPC)
        3. one axiom for each production in the grammar (roughly) expressed using logical formulae expressed in the basic axiom system
      2. Program Semantics are:
        1. A mathematical formula (in FOPC) that is asserted to be true at some point in the program.
        2. The formula at the end of the program is the meaning of the entire program.
      3. Orientation -- program verification.
        1. The intent is to verify, based on the axioms, that a program in fact matches the assertions associated with it.
        2. This is a fundamentally different orientation than either of the previous two techniques.
    5. Operational
      1. Language Semantics are:
        1. an abstract syntax
        2. the description of an execution state as a set of structured values (cf. denotational semantic domains)
        3. a set of instructions that change the state (i.e., values of elements of the current environment).
      2. Program Semantics are a set of snapshots that depict the execution of the program.
      3. Orientation -- compiler/interpreter writing; pedagogy.

  8. Example: An attribute grammar for type checking expressions.
    1. Here we are defining the static semantics of expression evaluation.
    2. Here are the components of the definition:
      1. A "term-factor" BNF grammar for expressions.
      2. A string-valued attribute named type.
      3. A global list-valued attribute named env, containing pairs of the form (name, type). (More about global attributes in the next lecture).
      4. A set of semantic equations associated with the grammar rules that define how type is computed.
    3. Here are the grammar rules and equations
      E ::= E1 + T     E.type = (if E1.type = T.type
                          then E1.type else "ERROR")
      E ::= T E.type = T.type
      T ::= T1 * F     T.type = (if T1.type = F.type
                          then T1.type else "ERROR")
      T ::= F T.type = F.type
      F ::= ident     F.type = Lookup(env, ident).type
      F ::= real      F.type = "real"
      F ::= integer   F.type = "integer"
      
    4. Observations
      1. Abstractly, the "=" appearing in the equations is mathematical equality, not variable assignment
      2. The "=" can be interpreted concretely as assignment, as we shall see shortly.
      3. The semantic equations appear in the same position as "action routines" in a standard compiler grammar built with Yacc 1 or comparable parser generator.
      4. The equations are in fact an abstraction of action routines coded in a programming language such as C.
        1. The semantic equations can convey precisely the same meaning as do action routines, but in a more abstract (and therefore more compact) form.
        2. Semantic actions can be used to express meaning that is computed in one or more parse tree traversals, such as interpretation, type checking, and code generation.
      5. In general, semantic actions express meaning in a syntax-directed framework
        1. That is, the semantic actions are invoked whenever the syntactic rule to which they are attached is applied.
        2. Unlike Yacc action routines, semantic actions are not restricted to a strictly bottom up evaluation order.
      6. Semantic equations employ auxiliary functions as necessary.
        1. In the above example, we use the Lookup auxiliary function to obtain the type value associated with an identifier.
        2. The examples below discuss auxiliary functions further.

  9. Another simple example -- evaluating expressions.
    1. Attribute grammars can be used to convey any aspect language semantics.
      1. In the above example we used equations to define expression type checking.
      2. In the next example, we use equations to define expression evaluation.
    2. The components of the definition are:
      1. A standard "term-factor" grammar for expressions (same as above).
      2. A numeric-valued attribute named val.
      3. A set of semantic equations that define how val is computed.
    3. Here are the rules:
      E ::= E1 + T     E.val = E1.val + T.val
      E ::= T E.val = T.val
      T ::= T1 * F     T.val = T1.val * F.val
      T ::= F T.val = F.val
      F ::= ident     F.val = GetVal(store, ident)
      F ::= real      F.val = read(val)
      F ::= integer   F.val = read(val)
      
    4. Observations
      1. As with the type checking equations, these equations are an abstraction of code that would perform evaluation.
      2. Here we use an auxiliary function GetVal to get the current value of an identifier from a value store.
      3. The other auxiliary function is read, which we assume lexically analyzes a particular real or integer literal and returns its value.

  10. Attribute evaluation using attributed parse trees
    1. The semantic rules of an attribute grammar are evaluated using a parse tree that is annotated with attributes, called an attributed tree.
    2. For example, here is an attributed parse tree for type checking the expression "a + b * 10", assuming a and b have both been declared integer.


      1. The labeled bullets by each tree node are attributes whose values are computed at those nodes.
      2. The env attribute at the root of the tree is a global attribute, whose value is accessible to all nodes.

    3. The evaluation of the type attribute is performed by applying the semantic equations to obtain the attribute values associated with each node of the tree.
      1. This is done by visiting each node of the tree in some order, and applying the semantic equation associated with the rule for that node.
      2. It is important to note that semantic equations themselves do not specify a particular order of tree traversal -- the equations only specify attribute dependencies.
      3. It is up to the attribute evaluator to choose a traversal order based on the dependencies that are defined; for now we'll use a postorder traversal.
    4. Let's now trace through an evaluation of the above attributed tree to see how evaluation works (done in lecture).
    5. Here's the result of the trace (in case you were asleep in or missed lecture).



    6. A similar trace can be performed on the following attributed tree for expression evaluation:


  11. Inherited versus synthesized attributes.
    1. Semantic equations specify attributes with two forms of dependencies:
      1. A synthesized attribute is one whose value is dependent on attributes below it in the parse tree.
      2. An inherited attribute is one whose value is dependent on attributes above or beside it in the parse tree.
    2. E.g., consider the following semantic rules
      X ::= Y Z       Y.a1 = X.a1
              Z.a1 = Y.a2
              X.a2 = Z.a2
      
      and the corresponding attribute dependency diagram


      1. The value of attribute Y.a1 is inherited down from X.a1.
      2. The value of attribute Z.a1 is inherited across from Y.a2.
      3. The value of attribute X.a2 is synthesized up from Z.a2.
    3. In the course of attribute evaluation, these forms of dependencies dictate how the parse tree must be traversed in order to achieve a complete evaluation.
      1. In an attribute grammar with only synthesized attributes, the evaluation can always be accomplished by a single bottom-up traversal of the tree, since synthesized attributes are always passed up in the tree during evaluation.
      2. When inherited attributes are added to a definition, the traversal order must be chosen such that when a particular attribute a is to be evaluated, the values of all other attributes on which a depends are known.
      3. In practice, with real programming languages, attribute dependencies are typically defined such that evaluation can be accomplished in one to three depth-first passes over the tree.
      4. We'll examine the details of this in the example at the end of the notes.
    4. Even though in practice attribute evaluation is done in ordered passes over the tree, it is important to remember that these passes are not defined by the equations.
      1. The attribute equations are strictly declarative.
      2. They can be evaluated in any order, as long as the dependencies are satisfied. 1

Footnotes:

____________________ 1 Familiarity with Yacc is not assumed in these notes; however, readers who have used Yacc or comparable parser generator will notice similarities with attribute grammar definitions. 1 Unless global attributes are used, about which more will be said next week.