CSC 103 Lecture Notes Week 6

CSC 103 Lecture Notes Week 6
Program Design Issues;
Analytic and Empirical Running Time Calculations

Midterm this Thursday.
Program design issues illustrated in HashTable example.
1. Use of exception classes in a good design.
2. The use Object and interfaces in collection class design.
3. The explicit design of testing classes.
Other design issues for Assignment 3.
1. Reuse of the GeneralList class.
2. The use of a iterator class for GeneralList.

Details testing collection class testing design.

Collection classes need to be thoroughly tested to valid their implementation.
The typical phases of thorough testing for a collection class include the following:
1. Phase 1: Test the class constructor(s), saving the results for use subsequent testing phases.
2. Phase 2: Test the constructive methods, building class objects of varying sizes and saving the results for use in subsequent testing phases.
3. Phase 3: Test the non-destructive access methods on the results of Phase 2.
4. Phase 4: Test the destructive access methods on the results of Phase 2.
5. Phase 5: Test certain interleavings of constructive, access, and destructive methods that can reveal flaws in method algorithms.
6. Phase 6: Repeat phases 1 thorough 5 to ensure duplicatibility of results.
7. Phase 7: Stress test by constructing, accessing, and destructing a very large collection, an order of magnitude larger than is ever expected to be used in practice.

E.g., here is a testing plan for the HashTable example, taken from the class header comment:

 *     Phase 1: Test the constructor, building tables of sizes 1, 5, 500, and
 *              the default size; confirm the size of each table.
 *
 *     Phase 2: Test the enter method, lookup, and delete methods on table of
 *              size 1.
 *
 *     Phase 3: Test enter method by filling up a table of size 5, including
 *              check of the TableFull exception.
 *
 *     Phase 4: Test the lookup method on the full table of size 5, expecting
 *              O(N) performance on lookups given full table.
 *
 *     Phase 5: Test the delete method on table of size 5, removing each entry.
 *
 *     Phase 6: Repeat phases 3 through 5 to exercise the somewhat tricky
 *              implementation of delete that uses active/inactive flags.
 *
 *     Phase 7: Successively test the enter, lookup, and delete methods on the
 *              the same table of size 5, with 0 through 7 entries.  Expect
 *              O(N) performance on lookups since entries are all marked as
 *              inactive instead of being null.
 *
 *     Phase 8: Rerun phase 7 on a new table of size 5, expecting O(c)
 *              performance on early lookups.
 *
 *     Phase 9: Test enter and lookup on a larger more sparsely populated
 *              table, expecting O(c) performance.

Analytic timing functions
1. Assignment 3 requires the definition analytic timing functions for three of the methods you will implement.
2. As we discussed in early lectures, a timing function defines how long we expect an algorithm to take when it runs.
  1. Big-O notation is a large-grain, order-of-magnitude definition of running time.
  2. A timing function is a more precise and detailed measure of running time.
3. As an initial example, let's consider a timing function for one of the simple algorithms we've worked with this quarter -- a findLargest(int[]) method that performs a linear search on an array:
```
        public int findLargest(int[] numbers) {

/* 1 */     int largestValue;                       // Largest value so far
/* 2 */     int i;                                  // Array index

/* 3 */     if ((numbers == null) || (numbers.length == 0)) {
/* 4 */         return Integer.MIN_VALUE;
/* 5 */     }

/* 6 */     for (largestValue = numbers[0], i = 1; i < numbers.length; i++) {
/* 7 */         if (largestValue < numbers[i]) {
/* 8 */             largestValue = numbers[i];
                }
            }
/* 9 */     return largestValue;
        }
```
4. For this method, the N problem size measure is the length of the numbers input array; so assume N = numbers.length.
5. We can then compute a specific running time function for this method as follows:
  1. The declarations on lines 1 and 2 count for no time.
  2. The running time of the if expression on line 3 is based on the number of individual operators involved in the expression, which in this case is two equality ops.
    1. Assume that each of these takes a constant amount of time c_eq.
    2. This means our timing function so far is T(N) = 2c_eq.
  3. The running time of of the return statement on line 4 (as well as on line 9) can be considered a small constant c_return; this gives us T(N) = 2 * c_eq + c_return so far.
  4. The for-loop on line 6 involves two assignment statements, one comparison operator, and one increment operator.
    1. The crucial thing to observe here is that the test and increment steps of the loop are each executed N times.
    2. So, the timing function now looks like T(N) = 2 * c_eq + + c_return 2 * c_assign + N * (c_compare + c_incr)
  5. Finally, the body of the loop has a comparison, an assignment, and two array accesses. Assume these operations have times c_compare, c_assign, and 2 * c_array-access.
  6. Given that these operations are in the body of the loop, the timing function becomes T(N) = 2 * c_eq + c_return + 2 * c_assign + N * (2 * c_compare + c_incr + c_assign + 2 * c_array-access).
6. Adding in the constant times for the two returns, we get T(N) = 2 * c_eq + 2 * c_return + 2 * c_assign + N * (2 * c_compare + c_incr + c_assign + 2 * c_array-access).
7. In terms of big-Oh notation, the running time is O(N).

Analytic time function for HashTable.enter.

public HashTable enter(HashTableEntry entry)
        throws HashTableFull, HashIndexInvalid {

    int index;                      // Hash index
    int i;                          // Vacant entry search loop index

    /*
     * Time = time of getKey method + time of hash method.
     */
    index = entry.hash(entry.getKey(), size);

    /*
     * Time = 2 constant compares + constant throw + constant constructor.
     */
    if ((index < 0) || (index >= size)) {
        throw new HashIndexInvalid(index);
    }

    /*
     * Time = two constant array accesses, one compare, two assigns, one
     * return.
     */
    if (table[index] == null) {
        table[index] = lastEntry = entry;
        return this;
    }

    /*
     * Time = constant array access + return + 2 * time of getKey + time of
     * equals.
     */
    if (table[index].getKey().equals(entry.getKey())) {
        return this;
    }

    /*
     * Time = K * time for loop body, where
     *     K is the number of times through the loop
     *     loop body time = times for elements as computed above
     */
    for (i = index + 1; i != index; i++) {

        /*
         * If we've come to the last entry in the table, set the loop index
         * to -1 so the search will continue at the top of the table.
         */
        if (i == size) {
            i = -1;
            continue;
        }

        /*
         * Output some trace information.
         */
        System.out.println("  Probing entry " + i +
            " with entry key " + entry.getKey());

        /*
         * If we've come to an empty table spot, put the entry there.  Save
         * the entry in the lastEntry data field, for use by lookup.
         *
         */
        if (table[i] == null) {
            table[i] = lastEntry = entry;
            return this;
        }

        /*
         * If the entry at the ith spot has the same key as the given
         * entry, we quit without doing anything.
         */
        if (table[i].getKey().equals(entry.getKey())) {
            return this;
        }

    }

    /*
     * If we've come out of the loop, it means we've fully exhausted all
     * table entries, which means the table is full.
     */
    throw new HashTableFull();

}

The important issue here is the value of K for the number of times the loop body is executed.
1. In the worst case, K = N, where N is the number of elements in the table; this occurs when the table is filled with entries that have keys that all collide.
2. In the best case, K = 1; this occurs where there are no collisions.
3. In the average case, K = c << N; this occurs where the number of collisions is a constant factor, independent of the table size.
Overall, the worst case, and hence the big-Oh performance of hash table entry must be stated as O(N).

Empirical running times.
1. An analytic timing function gives us a general formula for how long we expect a function to take when it executes.
2. We could get a hard number from an analytic timing function by assigning concrete values to the running time constants, based on what we know about how long the different programming language constructs take to execute.
  1. Such timing numbers require internal knowledge about how compilers work and how compiled code executes on different computer architectures.
  2. These kind of hard numbers can be hard to come by.
3. Therefore, the typical way to obtain actual running time numbers is to use a time-gathering library function of some form on an executing program.
  1. In the case of Java, the timing function is the method System.timeInMillis, which returns the current time of day in milliseconds.
  2. By bracketing a computation with calls to timeInMillis, actual timing numbers are obtained.
4. The general approach to obtaining empirical timing numbers is to run a subject program with a systematically varying problem size; we will examine this kind of measurement in lab.