Visitors

Kathi Fisler

In this lecture, we look at a common OO programming pattern called Visitors. Visitors are similar to passing functions as arguments, but they handle functions over class hierarchies instead of functions on single classes (as our functions-as-arguments lecture did). We’ll do this in the context of a concrete, real-world example.

1 Spreadsheet Formulas and Functions Over Them

Spreadsheets are formed of cells. Each cell can contain a concrete value or a formula that may reference other cells. References are specified by the row (labeled with numbers) and column (labeled with letters) of the desired cell. For example:

A | B | C

--------------------------------

1 | Hwk 1 | 120 |

2 | Hwk 2 | 100 |

3 | Hwk 3 | 115 |

4 | Total | =B1 + B2 + B3 |

The = in cell B4 indicates that the value of the cell is computed according to the given formula.

Let’s capture spreadsheet formulas through data structures. The following classes achieve this. The Examples class shows how to create formulas in this representation:

  interface IFormula {}

  class Num implements IFormula {
    int value;
    Num(int value) {
      this.value = value;
    }
  }

  class CellRef implements IFormula {
    String cellname;

    CellRef(String cellname) {
      this.cellname = cellname;
    }
  }

  class Plus implements IFormula {
    IFormula left;
    IFormula right;

    Plus(IFormula left, IFormula right) {
      this.left = left;
      this.right = right;
    }
  }

  class Examples {
    Examples(){}

    Num num2 = new Num(2);
    Num num5 = new Num(5);
    CellRef a10 = new CellRef("a10");
    Plus f1 = new Plus(num2, num5);
    Plus f2 = new Plus(a10, num5);
  }

For now, we are interested in two functions on spreadsheet formulas: one checks whether a formula contains any references to other cells, the other computes the value of a formula (assuming it has no references to other cells – a spreadsheet would have to compute values for all referenced cells before evaluating a formula). We add these to the IFormula interface as follows:

  interface IFormula {
    // does formula reference other cells?
    boolean noRefs();
    // compute value of formula; ASSUMES noRefs
    int valueOf() ;
  }

Implementations of these operations are shown in the following classes:

  class Num implements IFormula {
    int value;
    Num(int value) {
      this.value = value;
    }

    public boolean noRefs() {
      return true;
    }

    public int valueOf() {
      return this.value;
    }

  }

  class CellRef implements IFormula {
    String cellname;

    CellRef(String cellname) {
      this.cellname = cellname;
    }

    public boolean noRefs() {
      return false;
    }

    public int valueOf() {
      throw new RuntimeException("Unresolved cell reference");
    }

  }

  class Plus implements IFormula {
    IFormula left;
    IFormula right;

    Plus(IFormula left, IFormula right) {
      this.left = left;
      this.right = right;
    }

    public boolean noRefs() {
      return this.left.noRefs() && this.right.noRefs();
    }

    public int valueOf() {
      return this.left.valueOf() + this.right.valueOf();
    }

  }

  class Examples {
    Examples(){}

    Num num2 = new Num(2);
    Num num5 = new Num(5);
    CellRef a10 = new CellRef("a10");
    Plus f1 = new Plus(num2, num5);
    Plus f2 = new Plus(a10, num5);

    boolean test1(Tester t) {
      return t.checkExpect(num2.noRefs(),true);
    }

    boolean test2(Tester t) {
      return t.checkExpect(num2.valueOf(),2);
    }

    boolean test3(Tester t) {
      return t.checkExpect(a10.noRefs(),false);
    }

    boolean test4(Tester t) {
      return t.checkExpect(f1.noRefs(), true);
    }

    boolean test5(Tester t) {
      return t.checkExpect(f2.noRefs(), false);
    }

    boolean test6(Tester t) {
      return t.checkExpect(f2.noRefs(), false);
    }

  }

Make sure you understand this problem and the code before you move on.

2 Abstracting Over the Traversal

Both functions traverse the expression tree and combine the results at each node into the final answer. The computations done at each node are quite different across the two functions (one returns an int while the other returns a boolean, for starters), but the traversals themselves are the same in that we visit every node, recursively process any children nodes, then combine results from the children into the overall function result. We want to write a single abstraction function that we can customize with the computations per node in the tree.

Doing this requires addressing two issues: we have to write a traverse method over expressions that is sure to visit every node, and we have to parameterize traverse over the functions that process each type of node. We have to address these simultaneously.

How should we pass the functions for node processing to the traverse method? Last week, we saw that we can pass functions by wrapping them up in methods. We can do the same here. However, rather than pass three separate function-objects (one for each of Num, CellRef, and Plus expressions), we will pass a single object that contains the function parameters for all three node types. This is much cleaner, as it will let us add new kinds of nodes (for subtraction, averaging, etc) without having to edit the parameters to traverse.

2.1 An Interface for Formula Functions

Let’s first define an interface for our function-objects. Here, we call it IProc, and require it to provide a method for each node type. Since different traversals compute different answers, this interface must be parameterized over the return type of the overall function:

interface IProc<R> {

R processNum(Num n);

R processCellRef(CellRef c);

R processPlus(R leftres, R rightres);

}

The processPlus method will consume the results of each of the left and right subtrees. As you’ll see in a moment, this is consistent with putting all of the traversal logic in a single method.

2.2 Implementing Traverse

Next, we write traverse, which is a method over IFormula. Let’s start by adding Traverse to the IFormula interface. (Since traverse is designed to abstract over the noRefs and valueOf methods that we had in the original version, we have removed those two methods from the interface.) traverse takes in an IProc, which is parameterized over its return type (R), and also returns a value of type R.

interface IFormula {

R traverse(IProc<R> f);

}

What is R though? Written as shown, the Java compiler will complain that R is an undefined type (which it is). You might be tempted to add R as a type parameter to IFormula, but that makes no sense either (formulas are independent of the return types of functions that process them). Really, each use of traverse should get a return type R based on the return type of the IProc that is passed to it. We need a way to communicate this to Java. The following syntax achieves this:

interface IFormula {

<R> R traverse(IProc<R> f);

}

The first <R> tells Java that another part of the method declaration will supply a value for R (in this case, the value passed to IProc).

Now, we need to implement traverse in each of the variants of IFormula. For Num and CellRef, there isn’t much to do other than pass the current node to the corresponding processing method. For Plus nodes, we have to traverse the left and right subtrees and pass their results to the processing method for Plus nodes. The new IFormula classes appear as follows:

  class Num implements IFormula {
    int value;
    Num(int value) {
      this.value = value;
    }

    public <R> R traverse(IProc<R> f) {
      return f.processNum(this);
    }
  }

  class CellRef implements IFormula {
    String cellname;

    CellRef(String cellname) {
      this.cellname = cellname;
    }

    public <R> R traverse(IProc<R> f) {
      return f.processCellRef(this);
    }
  }

  class Plus implements IFormula {
    IFormula left;
    IFormula right;

    Plus(IFormula left, IFormula right) {
      this.left = left;
      this.right = right;
    }

    public <R> R traverse(IProc<R> f) {
      return f.processPlus(this.left.traverse(f), this.right.traverse(f));
  }
  }

2.3 Passing Methods to Traversals

All that remains now is creating IProc objects that show how to compute each of noRefs and valueOf through traverse. Basically, we take the method bodies from the original code and make them the bodies of the node-processing methods required in the IProc interface:

class NoRefs implements IProc<Boolean> {

public Boolean processNum(Num n) {

return true;

}

public Boolean processCellRef(CellRef c) {

return false;

}

public Boolean processPlus(Boolean leftres, Boolean rightres) {

return leftres && rightres;

}

class ValueOf implements IProc<Integer> {

public Integer processNum(Num n) {

return n.value;

}

public Integer processCellRef(CellRef c) {

throw new RuntimeException("Unresolved cell reference");

}

public Integer processPlus(Integer leftres, Integer rightres) {

return leftres + rightres;

}

Notice that each object supplies a concrete return type for IProc. Similarly, processPlus assumes that its two inputs have the expected return type for the overall function.

3 Summary

This example makes the separation of traversal from processing clearer than our categorized-menu example. Both are examples of visitors though, in that the core code visits all the specific classes containing data, and the function-objects indicate how to process that data. Hopefully, the combination of the two gives you a clearer picture of how visitors work.

That may still leave you wondering why anyone would write code using visitors though. The reasoning is clearer on this example than on the categorized-menu one. Traversing data structures is a core operation. We traverse trees, for example, every time we check or restore an invariant, or do some other computation over trees. Visitors allow us to write the traversal once. This can be helpful for two reasons:

Providing code libraries. There isn’t much point providing a tree or list library without providing methods for traversing them. Visitors are how you customize traversals to different methods.
Avoiding traversal errors. It can be easy to forget to traverse part of a complex data structure (how often did you lose a recursive case in 1101/2 if you forgot to use the template?) Abstracting over the traversal simply reduces the chance that you’ll make a traversal error in your code.

There is another interesting angle to this story. Tune in next class ...

1	Spreadsheet Formulas and Functions Over Them
2	Abstracting Over the Traversal
3	Summary