1 Arrays are Structures, not types
1.1 Important concepts from readings
1.2 Opening Questions
1.3 Data Type vs. Data Structures
1.4 Fixed Capacity Stack Implementation
1.5 Expandable Fixed Capacity Stack Implementation
1.6 In-class question
1.7 Iteration
1.8 Analysis of Experimental Data: Order of growth
1.9 Demonstration: Anagram Reborn
1.10 Sample Exam Question: solved
1.11 Sample Exam Question
1.12 Lecture Takeaways
1.13 Daily Exercise
1.14 Version : 2015/ 11/ 03

CS 2223 Nov 02 2015

Lecture Path: 04
Back Next

Expected reading: pp. 132-141, 176-183
Daily exercise: continuing Anagram hunt
Sample exam question: Postfix processing

We struggle with, agonize over and bluster heroically about the great questions of life when the answers to most of these lie hidden in our attitude toward the thousand minor details of each day.
Robert Grudin
Time and the art of living

1 Arrays are Structures, not types

1.1 Important concepts from readings

1.2 Opening Questions

1.3 Data Type vs. Data Structures

An array is a data structure, not a data type. You can be quite certain of that. The reason is simple. There is no API for an array. It offers two specific capabilities, namely, get the nth item and set the nth item.

In addition, you can’t even resize most arrays. The best you can do is create a new array that is larger (or smaller) than the original one, and then you copy elements from the original array into the new structure. The obvious conclusion is that resizing a stack takes time directly proportional to the stack size. This point will come up later this lecture.

This distinction is important because we are going to use arrays to implement a number of data types, starting with Stack

1.4 Fixed Capacity Stack Implementation

You are seen FixedCapacityStackOfStrings implementation (p. 133). This code can be found in the Git repository, in package algs.hw1.

public class FixedCapacityStackOfStrings { private String[] a; // holds the items private int N; // number of items in stack // create an empty stack with given capacity public FixedCapacityStackOfStrings(int capacity) { a = new String[capacity]; N = 0; } public boolean isEmpty() { return N == 0; } public boolean isFull() { return N == a.length; } public void push(String item) { a[N++] = item; } public String pop() { return a[−−N]; } }

Here are some questions to ask

Are you familiar with N++ and −−N operators?

What does the field N represent? Can you put it in words?

Can you make the String[] a field have the final modifier?

Can you make the String[] N field have the final modifier?

If you continue with p. 135 you will see a generic FixedCapacityStack implementation that uses the Java Generics capability to define a Stack of any type of element. Functionality is the same, this is just easier to program with.

The biggest limitation is that this will run out of the existing memory, so we need a strategy to deal with expanding storage on demand.

1.5 Expandable Fixed Capacity Stack Implementation

Pages 136-137 describe how to resize the stack as needed to ensure it grows to support the full set of objects being pushed onto the stack. This implementation is efficient for several reasons:

The goal, as stated by Sedgewick, is to ensure that "push and pop operations take time independent of the stack size." (p. 132). How can you be sure this is true if you know that resizing the stack takes time directly proportional to the stack size.

Statistics to the rescue.

Let’s say you did N operations, and each one took a fixed amount of time. For simplicity, we’ll call this a single unit of time, without regard to whether it is seconds or hours in length. Once completed, these N operations will have taken up N units of time

1 + 1 + 1 + 1 + ... + 1 + 1 = N time units

So the average is N/N or 1 time unit.

All good so far. What if some operations take longer? For example, what if each successive operation takes one more unit of time than the last? With N operations, you would then have

1 + 2 + 3 + 4 + ... + N-1 + N = N*(N+1)/2 time units

Here the average is (N+1)/2 time units. No longer is the time of an operation considered to be independent of other operations. In addition, the average is no longer a constant number.

Now let’s review the behavior of Stack. As long as the stack is not full, you can be assured that each push or pop operation will take time "independent of stack size", so it can be treated as a single time unit.

Now, what if you could guarantee that given a sequence of N operations, N-1 would require a single time unit while just one would require N time units. Note it is important that the N is the same value (both to count the number of operations and to reflect the length of the time to perform). The reason is that this special operation would take time to perform "in time that is dependent on the size of the input" (to twist the statement on page 132).

Now your summation becomes:

1 + 1 + 1 + 1 + ... + 1 + N

That is, 1 added N-1 times, and N added one final time. The total is 2N-1, which when divided by N is 2-1/N which can be considered to be a constant especially with increasing values of N.

How do you ensure this nice distribution? First, let’s review the code (which I have placed in algs.days.day04.ResizingArrayStack. Here is the revised push function.

public void push(Item item) { if (N == a.length) resize(2*a.length); // double size of array a[N++] = item; // add item }

Why double the size of the array?

So only when the array is FULL does the resize operation take place. But doesn’t it seem like OVERKILL to double the size of the array, just to make room for one more item?

Here’s the thing. Empirical studies have consistently shown that this is the best approach When you have no idea as to the number of elements that will be pushed onto the stack.

Brief timing demonstration to explain benefit. algs.days.Day04. ResizingArrayStack. When you compare two separate runs, how much SLOWER is the one that only extends linearly, say by 100 positions?

Resize by doubling Final Time: 0.01 sometimes 0.02 (stack size > 40000) (#Resize = 15) Stats: 0.00812842857142325, +/-0.003900404082173325 Stats: 0.009349857142850712, +/-0.002465528275471365 Stats: 0.007141957142862597, +/-0.0026478060533975866 Stats: 0.006713285714288692, +/-0.0026545128439921287 Resize by extending by 100 positions Final Time: 0.06 sometimes 0.07 (stack size > 40000) (#Resize ~= 422) Stats: 0.027894999999977542, +/-0.01669653814606029 Stats: 0.0278348571428348, +/-0.016968600784974114 Stats: 0.02109185714284204, +/-0.01744786265522115 Stats: 0.024319957142858208, +/-0.017627892533654304

Shrink size of array and prevent loitering

Note that the Resize capability has two distinct features. First, once the array is only 1/4 full, its size is cut in half. This is the opposite logic of the resize, if you think about it. Also, it makes sure to null out any entries once a value has been popped. This helps during the Garbage collection process.

Discussion on the size of the initial array for the stack.

Did anyone see the initial size of the array used for a newly created stack? Any thoughts?

1.6 In-class question

You are given a stack of elements with storage of size 4 and containing two elements as shown. Propose a sequence of operations that results in the revised stack with storage of size 8 and containing 5 elements as shown.

Figure 1: Open Discussion

1.7 Iteration

Review the discussion on page 138-139 which describes how to traverse all elements of an aggregate data type. In this case, the structure is an array, so a ReverseArrayIterator is developed.

Briefly, why does a Stack need a reverse array iterator instead of a regular forward iterator?

Note in the Tuesday Lecture help session tomorrow (12 Noon in FLAUD Lower) I will go over the Iterator code. This will be recorded, so if you can’t make this special help session, you will be able to review the video when you can.

You can review the algs.iterator.ArrayIterator sample code I have provided for iterating in forward direction over an array of elements.

1.8 Analysis of Experimental Data: Order of growth

Sedgewick presents the case for running benchmarks on your data to determine runtime performance. From this data, you can determine trends by plotting the results.

The goal is to determine Order of Growth for performance (p. 180). Here is sample output for algs.days.Day04.DoublingTest.

Growth Hypothesis?

250 0.0 500 0.1 1000 0.5 2000 4.0 4000 31.4 8000 259.4

Develop mathematical model (following upon Knuth’s foundations) that the running time of a program is determine by:

Sedgewick uses Tilde approximation which abstracts most code blocks into constant execution, relying on identifying the most frequently executed operations. Typically these are deep within a nested for loop, or you can determine the number of recursions that a recursive function call makes. We will introduce and present this topic during the remainder of this week. We can’t end a lecture by launching into a detailed mathematical conceptual presentation.

1.9 Demonstration: Anagram Reborn

To show the new algorithm, here is a pseudocode fragment. You will be asked to work with pseudocode this term, whether in class, or on a homework or on an exam so it is good to become comfortable with this notation.

INPUT: letters to use OUTPUT: list of anagrams using permutation of those letters only # positions is lowest permutation of n index positions anagrams = {} positions = [0, 1, 2, 3, ..., n-1] while positions is not empty word is permutation of letters if word is in dictionary and not in anagrams add word to anagrams else find largest prefix of word which is not a prefix of any dictionary word. truncate positions[] to this spot advance to next permutation, removing prior positions as needed.

In pseudocode, you can have variables and functions. Loops and if-then-else conditionals are also available. You use natural English fragments to describe computations, and use pseudocode to demonstrate high-level control flow.

Find the implementation within algs.day04.AnagramFinal. Show demonstration.

That’s right. It now takes 2 seconds to find an anagram of the 15 letters. This is the power of algorithms: Taking a brute-force computation from 31 days down to 2 seconds. 2,678,400 seconds down to 2 seconds. A million-fold speedup!

1.10 Sample Exam Question: solved

There is an alternate notation known as postfix notation. The above equation would be represented as "1 4 5 * 2 3 + * +". As its name implies, in postfix notation the operator comes after the arguments. Based on the structure of Dijkstra’s algorithm, devise a one stack solution to compute the value of this sequence of tokens.

On the exam, I would ask you to describe this algorithm using pseudocode:

This only shows for "+" and "*" but you can clearly see how to extend to other binary operators. This would be a suitable answer on an exam.

stack = new Stack while more input available s = next token if s is "*" then pop off last two values from stack and push back their product else if s is "+" then pop off last two values from stack and push back their sum else if s is value then push numeric interpretation of s onto stack pop value from stack and print it out

1.11 Sample Exam Question

This question assumes that you have a Stack of elements.

Write pseudocode for a function that takes a Stack of elements and modifies the stack so that its bottom two elements are swapped with each other.

1.12 Lecture Takeaways

1.13 Daily Exercise

Given the 10 base-10 digits (0, 1, 2, 3, 4, 5, 6, 7, 8 and 9) can you construct a mathematical expression solely of addition that produces the value 100?

For example, 18 + 29 + 30 + 4 + 5 + 6 + 7 = 99. So close! Note how each digit is used exactly once. In trying to solve this problem, try to make statements of fact that you can use to guide your solution. For example, you could start by realizing that no three digit number would ever be used, because that is already over the target total of 100.

1.14 Version : 2015/11/03

(c) 2015, George Heineman