CSC 103 Lecture Notes Week 7

CSC 103 Lecture Notes Week 7
More Kinds of Trees -- Heaps, B-Trees, Red-Black Trees

Overview
1. In these notes we examine some additional types and applications of tree data structures.
2. Heaps are a type of tree that can be used to represent a priority queue data structure, which is a very useful variant of a queue.
3. B-Trees are a form of n-ary search tree, typically used to hold large databases on external storage devices such as disks.
4. Red-black trees are a popular alternative to AVL trees for maintaining height-balancing in a binary search tree.
Heaps as priority queues
1. A priority queue data abstraction provides three basic operations: insert and deleteMin.
  1. insert is like the enqueue operation in a queue, but with a priority number.
  2. deleteMin is a priority-based dequeue operation.
  3. findMin is a priority-based first operation.
2. In what follows, we will use a tree data structure as the concrete data representation of a priority queue.
  1. Specifically, we will use a form of balanced binary tree called a heap.
  2. This heap-based representation of a priority queue provides the following performance for the basic priority queue operations:
    1. O(log N) for insert
    2. O(log N) for deleteMin
    3. O(1) for findMin
    where N is the number of elements in the priority queue.
3. Heap structure property
  1. A binary tree that is completely filled, except perhaps for the last row.
  2. This is called a complete binary tree (see Figure 6.2).
  3. For height h, contains between 2^h and 2^h+1 nodes.
  4. This means the height is O(N), for N = the number of nodes.
4. A very nice property of heaps is that they can be represented directly in an array, without using a pointer-based structure.
  1. Figure 6.3 shows the array-based representation of Figure 6.2.
  2. For any element at position i:
    1. the left child is at position 2i
    2. the right child is the element after the left at position 2i +1
    3. the parent is at position |i/2|.
  3. These facts mean that tree traversal is very simple and efficient.
5. A max size estimate must be provided whenever a new array-based heap object is constructed
  1. This is not typically a big problem.
  2. Plus, we can resize if necessary.
6. Figure 6.4 is a class skeleton.
7. Heap order property
  1. The primary goal for a heap is to find the minimum-value element quickly.
  2. Given this, it makes sense to store the smallest element at the root, and maintain this property recursively throughout the tree.
  3. Hence, the heap order property is:
    For every node X, the key value of the parent of X is <= the key of X, except for the parentless root of the tree.
  4. By this property, the min value in any tree is always at the root, which means findMin runs in O(1) time.
  5. Figure 6.5 illustrates two complete binary trees, one a heap (on the left) the other not a heap (on the right).
Applications of priority queues.
1. They're used a lot in operating systems, where jobs are put on a queue, but given a priority in terms of how soon they should be run.
2. A heap is also used as the basis of an O(N log N) sorting algorithm, known not coincidentally as heapsort.
Implementing the basic heap operations
1. insert
  1. To insert a node X into a heap, we create a hole at the next available location.
  2. Since we must maintain the complete tree structure property, this hole must be a the next available spot along the frontier of the tree.
  3. If X can be placed in the whole without violating the order property, we do it and we're done.
  4. Otherwise, we slide the hole-node's parent into the hole, thus bubbling the hole up towards the root.
  5. We continue this until X can be placed in the location of the hole without violating the heap order property.
  6. Figures 6.6 and 6.7 illustrate.
  7. This process is percolate up, whereby an element to be inserted is percolated up towards the root until it finds its proper place in heap order.
  8. The code is give in Figure 6.8.
  9. Note that the algorithm outlined above does not swap elements, but just moves the hole, so as to avoid wasting the extra time for an unneeded assignment statements.
  10. An array-based trace is on the back of page 190.
  deleteMin
  1. Heap delete is done a manner similar to insert.
  2. First we find the minimum, which is guaranteed to be at the root.
  3. When the minimum is deleted, a hole is created at the root.
  4. In order to maintain the heap structure property, we must remove the last leaf, X, and fill the hole with an appropriate value.
  5. If X can be placed a the root, we're done.
  6. If not, we slide the smaller of the hole's children into the hole, and repeat the process until X can be properly placed.
  7. This is a percolate down process, analogous to the percolate up of insert.
  8. Figures 6.9 through 6.11 illustrate.
  9. Figure 6.12 is the code.
  10. An array-based trace is on the back of page 192.
  11. Running times:
    1. O(log n) worst case.
    2. O(log n) average case, for equally likely keys.
Other heap operations.
1. A heap by itself is good at finding the minimum, but otherwise bad at finding other elements, since it maintains no other ordering information.
2. If we want to be able to get at the ith, we can use some additional data structure, such as a hash table that stores the position of a given key element (see Figure on back of 192).
3. If we assume we have some indexing structure such as this, whereby we can find the ith element, we can provide the following operations, all of which run in O(log N) time.
  1. decreaseKey(position, amount)
    1. This lowers the value of the key at the given position by the given amount.
    2. If the change violates the order property, it can be fixed by percolating up.
  2. increaseKey(position, amount)
    1. This increases the value of the key at the given position by the given amount.
    2. If the change violates the order property, it can be fixed by percolating down.
  3. delete(position)
    1. Removes the element at the given position.
    2. Performed by first doing decreaseKey(position, maxInt) followed by deleteMin().
4. A particularly interesting operation is buildHeap, which takes N input items (say in an array) and builds a heap.
  1. The strategy is:
    1. Place the N items in any order into the heap array, maintaining the structure property but not initially the order property.
    2. Percolate down the top half of the tree, using the algorithm given in Figure 6.14.
  2. With careful analysis, this can be shown to run in O(N) time.
The selection problem.
1. A widely-used operation on collections is findKth, which finds the kth smallest (or largest) element in collection.
2. A quick-and-dirty algorithm to do this is to do a simple sort, in O(N² time, and them access the kth element in O(1) time, for a total running time of O(N²).
3. An algorithm that uses buildHeap can do this operation in O(N log N) time, as follows:
  1. Read the N elements into an array.
  2. Apply buildHeap.
  3. Perform k deleteMin operations.
  4. That kth item deleted is the one we're looking form.
4. Running time:
  1. Worst case for buildHeap is O(N).
  2. Worst case for deleteMin is O(log N).
  3. Since there are k deleteMins, we get a total running time of O(N + k log N).
  4. If k = O(N / log N), buildHeap dominates the running time, and we're O(N).
  5. For larger k, the running time maxes out at O(N log N).
B-Trees -- from book.
Red-black trees -- from book.