CSC 530 Lecture Notes Week 3

CSC 530 Lecture Notes Week 3
A Brief Review of Lambda Calculus
Introduction to Programming Language Type Systems

Relevant readings this week -- papers 6 through 9 on type systems.

So what is this lambda calculus stuff?

The readings in functional languages, as well as the readings on type systems, use lambda calculus as a foundation for basic programming language concepts.
Though it may not be immediately obvious, our work thus far in Lisp has been essentially lambda-calculus based.
1. In lambda calculus, all computation is carried out by untyped functions.
2. In our approach to Lisp and its interpreter, we view computation in essentially the same way.
3. That is, all computational constructs in Lisp, including its own evaluator, are represented as untyped functions.

Here's a comparison table to show how standard lambda calculus and Lisp notations compare:

Lambda Calculus Notation	Lisp Notation
λ x.x	`(lambda (x) x)`
f = λ x.x	`(defun f (x) x)` -- or -- `(setq f (lambda (x) x))`
f 1	`(f 1)` -- or -- `(apply f (list 1))`
g = λ x.f x	`(defun g (x) (f x))`

Notationally, there are some details of computation in pure lambda calculus that differ from Lisp.
1. In general, pure lambda calculus is a more primitive notation, in that nothing is built-in except function definition and invocation.
2. However, once basic arithmetic and list operations are constructed from lambda calculus primitives, it's essentially the same notation as pure Lisp.
Given these observations, we will henceforth use Lisp notation for discussion purposes.
1. We'll use it in the same way that the Hudak and Cardelli/Wegner papers use the pure lambda calculus notation.
2. In particular, there is a lisp-based formulation of typed lambda calculus presented later in the notes that we'll use as the basis for assignment 2.
So what exactly is a lambda expression?
1. A lambda expression is an anonymous function value.
2. E.g., the expression (lambda (x) (+ x 1)) is a function that adds one to its argument.
3. What one does with a lambda expression is apply it to actual parameters, as in
```
(apply (lambda (x) (+ x 1)) '(10))
```
  which delivers 11.
4. Interestingly, the fundamental semantics of the following two definitions is the same:
```
(defun f (x) (+ x 1))
```
  versus
```
(setq g (lambda (x) (+ x 1)))
```
  that is, f and g define the same function.
5. For technical reasons, Common Lisp disallows the direct application of a lambda form, as in (g 10)
  1. Hence in GCL, application of a lambda form must be performed as (apply g '(10))
  2. This disallowance is really just a technicality; earlier versions of Lisp did in fact treat the preceding definitions of f and g as exactly the same, and hence allowed direct eval of lambda forms.
Where are we currently using lambda expressions?
1. In the solution to assignment 1, when we bind a function value we are effectively treating the function body as a lambda expression.
2. The keyword "lambda" need not be used in the function binding, but the notion of treating an unevaluated function body as a piece of data is the same.
What else are lambda expressions good for?
1. The lambda calculus is in a sense the grandparent of all purely functional notations.
2. When we study denotational semantics, we will define everything as a function, including memory itself.
3. For example, here is the purely functional definition of a variable-binding memory:
```
(setq memory
    (lambda (x) (cadr (assoc x '( (x 10) (y 20) (z 30))))))
```
4. Conceptually, this definition treats memory as a function that when applied to the name of a memory location returns its value (sort of like the hardware definition of memory as a "map").
5. Given this definition, here's how we look up the value stored at memory location y:
```
(apply memory '(y))
```
6. And here's how we add a new binding
```
(defun add-binding (memory binding)
    (nconc (cadadr (cdadar (nthcdr 5 memory))) (list binding)))
```
  (excuse the nconc, please).
7. When you can wrap your head around these definitions of memory, you're getting the gist of purely functional thinking.
8. And lest you think the definition of add-binding is some kind of Lisp parlor trick, it is in fact the direct Lisp translation of the denotational semantics function for "memory perturbation" as defined by Tennent.

What does it mean for a language to be typed?
1. Definition: A programming language is typed if every data value representable in the language has an associated category of representation called its type.
2. A type itself is defined at one of two levels:
  1. a basic (or primitive or atomic) type
  2. a composite (or constructed or non-atomic) type
3. Primitive types are defined as finite or infinite sets of values, e.g.,
  1. type boolean is the two-element set (true, false).
  2. type integer is the infinite set of all decimal integers
4. Composite types are defined by recursively applicable composition rules applied to atomic and/or composite types, e.g.,
  1. a record type is composed of values of two or more types
  2. an array type is composed of zero or more values of the same type
5. The type of a value constrains how the value may be interpreted -- i.e., how the value may be computationally manipulated.
6. In Cardelli an Wegner's colorful metaphor
  1. a typed value is clothed to protect it from undesirable outside elements that may want look at it in ways it should not be looked at
  2. an untyped value is naked and exposed to the outside world, subject to all manner of manipulation
Kinds of typedness
1. Strong versus weak typing
  1. A strongly typed language is one in which all values and variables have a consistently known type.
  2. A weakly type language is one in which some or all values and/or variables have no type, or a type that changes inconsistently.
2. Static versus dynamic typing
  1. A statically typed language is one in which the types of all values and variables are known before program execution begins.
  2. A dynamically typed language is one in which the types of some or all values and/or variables are not known until during program execution.
3. Monomorphic versus polymorphic
  1. A monomorphic language is one in which all functions operate on parameters of exactly one type.
  2. A polymorphic language is one in which some or all functions may operate on parameters of more than one type.
4. Encapsulated versus flat
  1. A language with encapsulated types provides the means to hide the representation of types from external view
  2. A language with flat typing provides no form of type encapsulation.
5. Subtyped versus non-subtyped
  1. A language with a subtyping capability allows one type to be a parent type from which one or more child types may inherit properties.
  2. A language without subtyping provides no form of inheritance.
6. Generic versus non-generic
  1. A language with generics is one in which some form of parameterized type is definable.
  2. A language without generics is one in which each type must be fully and explicitly defined.
A spectrum of typeless to typeful programming languages
1. Typing in Lisp is weak, dynamic, monomorphic, flat, and non-generic.
2. Typing in C is somewhat weak, static, monomorphic, flat, and non-generic.
3. Typing in C++ is weakish, mostly static, subtype polymorphic, encapsulated, and generic (via templates).
4. Typing in Ada is strong, static, monomorphic, encapsulated, and generic.
5. Typing in ML is strong, static, parametrically polymorphic, encapsulated, and generic.
6. Typing in Java is strong, static (but dynamically queriable), subtype polymorphic, encapsulated, and non-generic.
The evolution of typing in programming languages
1. LISP -- very minimal typing, non-explicitly declared
2. FORTRAN -- numerical typing, but no real data structures
3. ALGOL 60 -- the first significant language with explicit typing
4. SIMULA 67 -- widely recognized as the first language to provide subtyping via inheritance; also provided a basic form data abstraction
5. ALGOL 68 -- ALGOL 60 typing on steroids
  1. An often-cited but infrequently-used language
  2. ALGOL 68 provided one of the first formal definitions of structural versus name type equivalence.
6. Pascal -- a modernization of ALGOL 60, but with no inheritance or modularization.
7. Smalltalk -- a modernization of Simula 67, but with no essentially new features.
8. Modula-2-- a modernization of Pascal, with the addition of encapsulation, but no generics and no inheritance.
9. Ada -- a modernization of Modula-2, with the addition of generics, but no inheritance.
10. Modula-3 and Oberon -- a modernization of Modula-2, with the addition of generics and inheritance.
11. ML -- a modernization of Lisp, with the addition of all forms of typing plus a pioneering type inference mechanism.
12. C++ -- a bastardization of all that came before it.
13. Java -- back to the future of SIMULA 67, with C++ syntax.
14. C# -- the emperor's old clothes.
Kinds of polymorphism
1. Genuine polymorphism (called "universal" in Cardelli and Wegner) is provided in (typed) languages where a function with a single body of code can be invoked with actual parameters of different types.
  1. For example, the following is a genuinely polymorphic equality function:
```
forall type T,
  function Eq(x:T, y:T) =
    x = y;
```
  2. The body of the function works for any arguments of the same type for which equality is defined.
2. The use of the universal quantification (forall) in the preceding example is arguably the most general form of polymorphic function definition
  1. This form of polymorphism is called parametric
  2. It involves the use of type variables to define the types of function parameters
  3. These type variables hold the position of a variable number of types, constrained only by how the body of the function is implemented.
3. Another form of genuine polymorphism is available in languages that support subtyping through inheritance
  1. For example,
```
class A = ... ;
class B subclass of A = ... ;
class C subclass of a = ... ;
function Eq(x:A, y:A) =
    x = y;
```
  2. Here the function Eq is polymorphic over types A, B, and C.
  3. This is due to the normal rules for subtyping defined by inheritance (more on this shortly).
4. The definition polymorphic functions via inheritance is somewhat less general than parametric polymorphism,
  1. This type of polymorphism is called subtype (or inclusion in Cardelli and Wegner).
  2. The standard rule for subtype polymorphism is that a function defined on a parent type is polymorphic on all subtypes of that parent type.
5. Many programming languages that do not support genuine polymorphism have features that provide apparent (called "ad-hoc" in Cardelli and Wegner) polymorphism.
  1. The two major forms of apparent polymorphism are overloading and coercion.
  2. What distinguishes genuine from apparent polymorphism is that with genuine polymorphism, neither the body of the polymorphic function nor the types of its actual parameters change when the function is invoked.
    1. In the case of overloading, there is a separate function body for every distinct set of argument types (i.e., the function body changes).
    2. In the case of coercion, the types of actual parameters are forced to the required type of the called function when the function is called (i.e., the actual parameter types change).
Type expression sublanguages (a.k.a., type algebras).
1. In order to define data types conveniently, programming languages provide linguistic features for type construction.
2. Typically, languages define a set of built-in atomic types, such as integers, strings, etc.
3. To build composite types, languages provide a variety of mechanisms to build arrays, records, and the like.
4. Type sublanguages vary widely in syntax, semantics, and power of expression.
5. In the Cardelli and Wegner paper, as well as in our work with Lisp, we will do our best to factor out syntactic details, focus on fundamental semantics, and provide a maximally powerful form of definition.
Types as sets (or lattices) of values
1. In the definition above, we gave a two-fold specification of types:
  1. A base set of type primitives
  2. A set of type composition rules
2. A more basic formal semantic definition of types can be given entirely in terms of sets (i.e., without resorting to any form of composition rules).
3. We will discuss this form of definition later in the quarter in conjunction with other aspects of formal programming language semantics.

A Lisp-based version of typed lambda calculus (the basis for programming assignment 2).

Assignment 2 entails the addition of type checking to the untyped interpreter of Assignment 1.
To pursue this work, we must add to standard Lisp a set of typing primitives and rules, i.e., a type algebra.

Here is an overview, given in terms of the new functions that we will add:

(deftype name type)
    where type is
      * one of the atomic type names sym, int, real, string, or bool
      * a composite type specified with one or more of the four type forms
        below
      * the name of a defined type, including a recursive reference to such a
        name
      * a type variable of the form ?X, for any atom X

(array type [bounds])
    where bounds is either an integer, or (integer integer) pair, or a type var

(record fields)
    where fields is a list of (name type) pairs in which all names must be
    unique; fields may be a single type var

(union fields)
    where fields is a list of (name type) pairs in which all names and types
    must be unique; fields may be a single type var

(function args outs [suchthat])
    where args and outs are lists of names or (name type) pairs; names and
    (name type) pairs can be mixed in a signature; the name in a (name type)
    pair may be empty; args and/or outs may each be a single type var; suchthat
    is of the form (suchthat predicate), and the predicate can reference type
    vars

In conjunction with these additional typing functions, the defun will be extended to support typed and polymorphic arguments, as follows:
```
(defun name args [outs] [suchthat] body)
    where args, outs, and suchthat are as defined above for the function type
```

Literal values for each of the types specified above are denoted as follows:

Type	Literal Denotation
`sym`	any quoted atom
`int`	any atom for which integerp is true
`real`	any atom for which numberp is true and integerp is false
`string`	any atom for which stringp is true
`bool`	t or nil
`array`	a list, the elements of which meet the array's bounds and type specs
`array`	a list, the elements of which meet the record's field specs
`union`	a value whose type is one of field types
`function`	the name of a defun'd function or a lambda expression whose signature matches the function type's args and outs specs

Further details are in the Assignment 2 writeup, q.v.
Some discussion of the readings on functional languages.
Some discussion of the readings on type systems.