1 Well Begun Is Half Done
1.1 Important concepts from readings
1.2 Opening Questions
1.3 Data Types
1.3.1 Encapsulation
1.3.2 Design APIs
1.4 Demonstration: Bisection computation
1.5 Dijkstra’s Two-Stack Algorithm
1.6 Stacks appear in many places
1.7 Lecture Takeaways
1.8 Daily Exercises
1.9 Version : 2015/ 11/ 03

CS 2223 Oct 30 2015

Lecture Path: 03
Back Next

Expected reading: pp. 96-99, 121-129
Assume you know: pp. 64-89
Daily Exercise: Sorting Five Values

Wirth’s Law: Software is getting slower more rapidly than hardware becomes faster.
Nicholas Wirth

1 Well Begun Is Half Done

1.1 Important concepts from readings

Today I asked you to read about three fundamental data types in computer science. These are Bag, Stack and Queue. Two of these (Stack and Queue) are already implemented by the JDK. You need to understand the interfaces for all three data types. You need to understand the choices you make when implementing them.

1.2 Opening Questions

First let’s complete the accurate accounting to find the greatest and second greatest elements in a collection of N elements. Given just a small sample, you can see the following:

Size

Number Operations

2

1

3

3

4

4

7

8

8

9

Find top two values in n + ceiling(log(n)) - 2

If you reflect on the bracket ideas I introduced, you can see that it still takes n-1 number of comparisons to locate the winner. Along the way you have derived useful information to help find the second largest. It can only be on the "path" of the tournament from the winner down to its first position. Given N elements, the height will be ceiling(log n), that is, the smallest integer larger than or equalt to the real number of the computation log N. If we start with the final two teams in the tournament as our "#1" and "#2" then we only need to check ceiling(log) - 1 numbers to find the second largest. Thus the maximum number of comparisons to find the top two numbers is n-1+ceiling(log(n))-1 or n + ceiling(log(n)) - 2.

Let’s start with the question I asked you to consider on Thursday:

Here is my solution. Discuss.

Discuss

What do you make of the observation that 5 + ceiling(log(5)) - 2 = 6?

This question is of interest theoretically because we still don’t know the fewest number of comparisons to sort sixteen elements. Latest research from 2011 confirms that 46 comparisons suffices and that 44 do not. You would make international news if you could prove that 45 comparisons sufficed. Find the table of best comparison results on Wikipedia.

1.3 Data Types

The famous computer scientist Nicklaus Wirth published an influential book called "Algorithms + Data Structures = Programs (1978)." You need to understand these fundamental data types so you will be ready to tackle the more advanced ones later in this course.

Operation

Bag

Queue

Stack

add

add(Item)

enqueue(Item)

push(Item)

remove

--

dequeue()

pop()

size

size()

size()

size()

isEmpty

isEmpty()

isEmpty()

isEmpty()

We will look at each of these types in turn, as well as the fundamental core API. The notion of an API is found in the discussion from page 96-99. Here are the hilights:

These data types were designed to support well-defined behaviors but they also reflect common real-world scenarios. A Queue, for example, is a good way to model a line in a cafeteria. The first one into the line gets the food before everyone else behind him. A Stack can represent the trays in a cafeteria: You grab the topmost tray, which was the last one placed there. Finally a Bag might represent the coins in the cash register. When the cashier wants to grab a quarter, he just grabs one at random from the bin containing all quarters.

We will be sure to separate discussions of a Data Type from implementation discussions of a Data Structure.

1.3.1 Encapsulation

A well-designed data type does not reveal its implementation, because that is irrelevant to the outside world. It is kept private and hidden. That said, it will always reveal (explicitly or not) its performance, which can be documented. The fundamental principle is to describe the time it takes to perform an operation in relation to the problem size N. We started this discussion on Oct 29 2015 and we will be prepared to complete on Nov 02 2015.

This relates to material found on page 172-177.

1.3.2 Design APIs

A data type is defined by the behavior that it supports. In this regard, an array is not a data type but is a fundamental data structure, must like an int represents 32-bit integers while a long represents 64-bit integers.

Page 97 of the book lists a number of pitfalls when designing the interface to a data type. In many ways, this is an engineering problem, and most people can recognize a well-designed interface when trying to use it for the first time.

We will aim for the following attributes when designing data type interfaces:

1.4 Demonstration: Bisection computation

I didn’t get to this demonstration yesterday.

One often needs to compute the root of continuous real functions. The root is where the function f(x) cross the x-axis, or in otherwords, where f(x) = 0. The Bisection method works for continuous functions if you are given two indices, a and b, for which f(a) and f(b) have opposite signs.

See code in algs.experiment.bisection.BisectionMethod.

int signA = (int) Math.signum(f.compute(a)); int signB = (int) Math.signum(f.compute(b)); double c = (a+b)/2; while (Math.abs(b-a) > threshold) { c = (a+b)/2; int signC = (int) Math.signum(f.compute(c)); if (signA == signC) { a = c; } else { b = c; } }

The question to ask is this: How many times does this loop execute?

1.5 Dijkstra’s Two-Stack Algorithm

This algorithm on page 129 uses two stacks to process infix expressions with parentheses.

How do you solve problem "(1 + ((2+3) * (4*5)))"? Observe behavior and discuss.

You must walk through the trace on page 130-131 to verify you see how this algorithm works. On the homework you will see where it breaks down.

Sample Exam Question:

There is an alternate notation known as postfix notation. The above equation would be represented as "1 4 5 * 2 3 + * +". As its name implies, in postfix notation the operator comes after the arguments. Based on the structure of Dijkstra’s algorithm, devise a one stack solution to compute the value of this sequence of tokens.

On the exam, I would ask you to describe this algorithm using pseudocode, which we will be ready to discuss starting Monday.

Note that this is a practical problem. Have you heard of "Postscript"? This is the underyling notation for high-quality documents that, today, are known as PDF documents.

1.6 Stacks appear in many places

There are three engines on the left "A B C" in that order from left to right. You can move an engine straight through to the right track or it can move onto the spur track. Once on the spur track, the engine can only move onto the right track.

Figure 1: Track Manipulation

What is the only permutation of these three engines that you cannot reproduce on the right track?

1.7 Lecture Takeaways

1.8 Daily Exercises

I described how the Anagram searcher would use a brute force approach to compute all different possible permutations of the words "dry oxtail in rear" to find a fifteen-letter word. Note: these implementations can be found in the Git Repository under algs.days.day02. I didn’t complete the analysis on its speed, or "how many words per second". Here are some raw results:

dryoxtailinrear 50000000,dryxaeaonlritri,Thu Oct 29 20:51:59 EDT 2015 100000000,drytinrxarielao,Thu Oct 29 20:53:44 EDT 2015 150000000,dryaeilxrtarnio,Thu Oct 29 20:55:30 EDT 2015 200000000,drylotairaeixrn,Thu Oct 29 20:57:15 EDT 2015 250000000,dryitarixalrneo,Thu Oct 29 20:59:01 EDT 2015 300000000,drynlreaairitxo,Thu Oct 29 21:00:47 EDT 2015 350000000,dryreiliaoatxrn,Thu Oct 29 21:02:32 EDT 2015 400000000,dryaoairxnleirt,Thu Oct 29 21:04:17 EDT 2015 450000000,dryraoxrnaeilit,Thu Oct 29 21:06:03 EDT 2015 500000000,droyiearaintxrl,Thu Oct 29 21:07:48 EDT 2015

So this takes about 1min 45 seconds for 50,000,000 words. Not bad! But remember, this is trying to find 15! = 1,307,674,368,000 possible 15 letter words. At this rate, it will take 45,768 minutes or 31.78 days.

Can you come up with strategies that might be used to improve the performance of this algorithm, given the above output and the sample list of english words.

1.9 Version : 2015/11/03

(c) 2015, George Heineman