Solutions - Exam 2 CS 4341 C00

CS4341 Introduction to Artificial Intelligence. C2000
Solutions - Exam 2 - February 29, 2000

By Prof. Carolina Ruiz
Department of Computer Science
Worcester Polytechnic Institute

Instructions

Show your work
Justify your answers
Use the spaces provided to write your answers
Write your name on each page
Ask in case of doubt

NAME: Carolina Ruiz _______________________________________

Problem I. Constraint Satisfaction (20 points)

Suppose that a medical doctor needs your help constructing a constraint net to find prescriptions for his/her flu patients. The following table shows the symptoms exhibited by flu patients and the list of medications that can be prescribed for each symptom.

Symptom                Medications

Headache               H1, H2, H3
Runny nose             R1, R2
Fever                  F1, F2
Sore throat            S1, S2

Assume also that the FDA has found that each the following sets of medications presents dangerous drug interactions and so drugs in each set should not be prescribed together.

Drugs that must NOT be prescribed together

{H1,S1}
{H3,S2}
{H2,R2}
{F1,S1}

On the other hand, the following sets of drugs work in conjunction and so if one of them is prescribed then the other one must be prescribed as well.

Drugs that must be prescribed together

{H2,S2}
{S2,F1}

Answer the following questions:

(10 points) Construct a constraint net that helps the medical doctor in finding appropriate prescriptions for his/her flu patients. Provide the constraint matrix associated to each link in the net explicitly.


   ----------         M2         ------------
  | Headache |------------------| Runny Nose |
   ----------                    ------------
       |
    M1 |
       |
       |
  -------------                    -------
 | Sore Throat |------------------| Fever |
  -------------        M3          -------


 M1 |  S1  S2      M2 |  R1  R2     M3 |  F1  F2
 -------------     -------------    -------------
 H1 |  0   0       H1 |  1   1      S1 |  0   1
 H2 |  0   1       H2 |  1   0      S2 |  1   0
 H3 |  1   0       H3 |  1   1

(10 points) A patient may be allergic to some drugs. Hence, an appropriate prescription for a flu patient should consist of 4 medications, one for each symptom, satisfying the constraints above and to which the patient is not allergic. Find an appropriate prescription for Lucy, who is allergic to F1 and to R2. Describe in detail each of the steps of your problem solving strategy:

Arc Consistency Check

During this stage of the process, arcs are revised to eliminate
values that will no longer be valid due to Lucy's allergies.
We revise below the domain of each of the variables:


Fever: Reduced domain = {F2} 
F1 is eliminated because Lucy is allergic to it.

Runny Nose: Reduced domain = {R1} 
R2 is eliminated because Lucy is allergic to it.

Sore Throat: Reduced domain = {S1} 
S2 is eliminated because according to the constraint matrix M3
the only medication for sore throat that can be prescribed in 
conjunction with F2 is S1.

Headache: Reduced domain = {H3} 
H1 and H2 are eliminated because according to the constraint 
matrix M1 the only medication for headache that can be 
prescribed in conjunction with S1 is H3.
Note that H1 would be eliminated anyway because all entries 
in M1 corresponding to H1 are 0.

Depth First Search

Selection of the 1st node
Selection of each consecutive node

The search is quite simple since the reduced domain of each 
variable contains just one value. The only possible solution 
is {H3,R1,F2,S1}. Selecting the variables in any order will 
lead to the same valid solution in the same amount of time.

NAME: Carolina Ruiz _______________________________________

Problem II. Logic-Based Systems (20 points)

Consider the following set of axioms:

forall x [equal(x,x)]
forall y,z [equal(y,z) -> equal(z,y)]
forall w,s,t [equal(w,s) and equal (s,t) -> equal(w,t)]
equal(b,a)
equal(b,c)

And the conclusion:

equal(c,a)

As usual, a, b, and c denote constants; and x, y, z, w, s, and t denote variables.

Prove the conclusion from the axioms by refutation using resolution:

(10 points) Translate the axioms and the negation of the conclusion into clausal form:

     
      equal(x,x)
     
 !equal(y,z) or equal(z,y)
     
 ![equal(w,s) and equal(s,t)] or equal(w,t) 
 !equal(w,s) or !equal(s,t) or equal(w,t) 
     
 equal(b,a)
     
 equal(b,c)
     
 (negated conclusion:)  !equal(c,a)

(10 points) Apply resolution (state explicitly which substitutions are made):


2. !equal(y,z) or equal(z,y)
5. equal(b,c)
---------------------------------------------
7.  equal(c,b)                 substituting y by b, and z by c.

7.  equal(c,b)
3. !equal(w,s) or !equal(s,t) or equal(w,t) 
---------------------------------------------
8. !equal(b,t) or equal(c,t)   substituting w by c, and s by b.

8. !equal(b,t) or equal(c,t) 
4. equal(b,a)
---------------------------------------------
9. equal(c,a)                  substituting t by a.

9. equal(c,a)       
6. !equal(c,a)
----------------
   EMPTY CLAUSE

NAME: Carolina Ruiz _______________________________________

Problem III. Learning - Decision Trees (20 points)

The following table contains training examples that help a robot janitor predict whether or not an office contains a recycling bin.

STATUS DEPT. OFFICE SIZE RECYCLING BIN?
1. faculty ee large no
2. staff ee small no
3. faculty cs medium yes
4. student ee large yes
5. staff cs medium no
6. faculty cs large yes
7. student ee small yes
8. staff cs medium no

(15 points) Use information theory to construct a minimal decision tree that predicts whether or not an office contains a recycling bin. Show each step of the computation.

            x      1     2/3    1/3    1/2
     ---------------------------------------
      log_2(x)     0    -0.6   -1.6    -1



 Selecting the attribute for the root node:

                             YES                   NO

Status:

 Faculty:    (3/8)*[ -(2/3)*log_2(2/3) - (1/3)*log_2(1/3)]
 Staff:    + (3/8)*[ -(0/3)*log_2(0/3) - (3/3)*log_2(3/3)]
 Student:  + (2/8)*[ -(2/2)*log_2(2/2) - (0/2)*log_2(0/2)]
 TOTAL:    = 0.35 + 0 + 0 = 0.35

Size:

 Large:      (3/8)*[ -(2/3)*log_2(2/3) - (1/3)*log_2(1/3)]
 Medium:   + (3/8)*[ -(1/3)*log_2(1/3) - (2/3)*log_2(2/3)]
 Small:    + (2/8)*[ -(1/2)*log_2(1/2) - (1/2)*log_2(1/2)]
 TOTAL:    = 0.35 + 0.35 + 2/8 = 0.95  

Dept:

 ee:         (4/8)*[ -(2/4)*log_2(2/4) - (2/4)*log_2(2/4)]
 cs:       + (4/8)*[ -(2/4)*log_2(2/4) - (2/4)*log_2(2/4)]
 TOTAL:    = 0.5 + 0.5 = 1



Since status is the attribute with the lowest entropy,
it is selected as the root node:

                      
                     Status
                 /     |      \
       Faculty  /      | staff \ student
               /       |        \
         1-,3+,6+   2-,5-,8-   4+,7+
          BIN=?      BIN=NO   BIN=YES


Only the branch corresponding to Status=Faculty needs further 
processing. 


 Selecting the attribute to split the branch Status=Faculty:

                             YES                   NO

Dept:
 ee:         (1/3)*[ -(0/1)*log_2(0/1) - (1/1)*log_2(1/1)]
 cs:       + (2/3)*[ -(2/2)*log_2(2/2) - (0/2)*log_2(0/2)]
 TOTAL:    = 0 + 0 = 0


Since the minimum possible entropy is 0 and dept has that
minimum possible entropy, it is selected as the best attribute
to split the branch Status=Faculty.  Note that we don't even have
to calculate the entropy of the attribute size since it
cannot possibly be better than 0.

                      
                     Status
                 /     |      \
       Faculty  /      | staff \ student
               /       |        \
         1-,3+,6+   2-,5-,8-   4+,7+
           Dept      BIN=NO   BIN=YES
          /    \ 
      ee /      \ cs
        /        \
       1-       3+,6+
     BIN=NO    BIN=YES
        


Each branch ends in a homogeneous set of examples so the
construction of the decision tree ends here.

(5 points) Translate your decision tree into a collection of decision rules.

IF status=faculty AND dept=ee THEN recycling-bin=no

IF status=faculty AND dept=cs THEN recycling-bin=yes

IF status=staff THEN recycling-bin=no

IF status=student THEN recycling-bin=yes

NAME: Carolina Ruiz _______________________________________

Problem IV. Learning - Neural Networks (20 points)

Consider the following function of two variables:

        A  B  Desired Output
        0  0    1
        1  0    0
        0  1    0
        1  1    1

(5 points) Prove that this function cannot be learned by a single perceptron that uses the step function as its activation function.

This function cannot be computed using a single perceptron since
the function is not linearly separable:

      B
       ^
       |
       0    1
       |
       |
    ---1----0-------->
       |              A

(15 points) Explain how you would train a neural network (in which each unit uses the sigmoid function as its activation function) to learn this function by describing in detail each of the steps below:

(5 points) Topology of your neural nets:

How many hidden layers would you use?

     
Only one hidden layer is needed since this is a boolean 
function and one hidden layer is sufficient to represent 
any boolean function.

How many hidden units per layer?

     
I'll use just two hidden units since the function seems 
simple enough.

How many connections would your net have?

     
  4 connecting the input layer to the hidden layer
+ 2 connecting the hidden layer to the output layer

The 3 links connecting the fixed input -1 to each of 
the units can also be taken into account here.

Draw the neural net that would result from your choices above.

     
	       -1
                |
                V
               ---                 -1
     A ------>| C | ---------       | 
       \    /  ---           |      v
        \  /                 |     ---
         \/                   --> | E |
         /\                   -->  ---
        /  \                 |
       /    \  ---           |
     B ------>| D | ---------
               ---
                ^
                |
               -1

  input      hidden               output
  layer      layer                layer

(10 points) Training your neural net:

How would you select the initial weights of the connections?
```
I would select small (i.e. close to 0), random values. 
```

Explain how the error back propagation algorithm would work when applied to your neural net.

The algorithm will iterate the following set of steps:

 For each of the training examples:

Propagate the inputs forward through the net. 
Compute the output of each node (= the sigmoid 
function applied to the weighted sum of the inputs 
to the node) first for the hidden layer and then for 
the output layer.
Compute the errors (i.e. the betas) associated to 
each node, starting from the output layer and moving 
back to the hidden layer.
Compute the weight changes (i.e. the deltas) 
associated with the current training example for 
each of the connections in the net. 

After processing all examples, update the weight 
of each connection by assigning to it the sum of that 
weight's delta for each of the examples plus the current 
value of the weight.

until we obtain "satisfactory weights". (See the answer 
to the following question for more precise stopping 
criteria).

When would you stop the iterations of the error back propagation algorithm?

The following are possible stopping criteria:

When the updated weights after an iteration of the 
error back propagation procedure are almost identical to
the weights before that iteration.
This is in general the best stopping criterion.
When the output of the net is very close to the
desired one. Yeahhhh!
When the number of iterations exceeds a pre-defined
threshold. Say the procedure has run for 20,000 iterations
and it doesn't seem to be converging.
When the output error seems to be increasing instead
of decreasing.

What would you do if the trained neural net does not generate the desired outputs?

I would run the error back propagation again modifying
the parameters to correct what seems to be the problem 
with the current application of the procedure:


If the updated weights after an iteration of the 
error back propagation procedure are almost identical to
the weights before that iteration but the output is not
the desired one:
I would select new (small, random) initial weights and 
run the process again.
If the number of iterations exceeds a pre-defined
threshold. Say the procedure has run for 20,000 iterations
and it doesn't seem to be converging:
I would increase the learning rate and/or 
select new (small, random) initial weights and 
run the process again.
If the output error seems to be increasing instead
of decreasing:
I would decrease the learning rate and/or 
select new (small, random) initial weights and 
run the process again.

NAME: Carolina Ruiz _______________________________________

Problem V. Learning - Genetic Algorithms (20 points)

Consider a genetic algorithm in which the rank method is used to select candidates. Recall that, given a fixed value p between 0 and 1, the probability of picking the first-ranked candidate is p; the probability of picking the second-ranked candidate is p times the remaining probability (1-p), that is p*(1-p); the probability of picking the third-ranked candidate is p times the remaining probability (1-(p + p*(1-p))), that is p*(1-(p + p*(1-p))) = p*(1-p)(1-p); and so on, until only one candidate is left, in which case this candidate receives the remaining probability, so that the probabilities of all the individuals in the population sum up to 1.

(7 points) Suppose that you have five candidates. You decide to select one using the rank-space method with p=0.25. Compute the probabilities for each of the five as a function of rank.

  
    Individuals     Rank      Rank Fitness
    -----------------------------------------
        C1           1st         0.25

        C2           2nd         0.1875    = 0.25*(1-0.25)

        C3           3rd         0.140625  = 0.25*(1-0.4375)

        C4           4th         0.1054687 = 0.25*(1-0.578125) 

        C5           5th         0.3164063 = 1-0.6835937
    

Note that:

 The sum of the rank fitnesses of C1 + C2 = 0.4375
 The sum of the rank fitnesses of C1 + C2 + C3 = 0.578125 
 The sum of the rank fitnesses of C1 + C2 + C3 + C4 = 0.6835937 
 The sum of the rank fitnesses of C1 + C2 + C3 + C4 + C5 = 1

(7 points) What is peculiar about the rank fitnesses that you computed in the previous part?

That the probability of selecting the least fit element (i.e. C5)
is greater than the probabilities of selecting any of the other 
(fitter) individuals.

(6 points) Can you avoid the peculiarity you observed in part 2 by restricting the probability p to a particular range? Your answer should be independent of the number of individuals in the population.

Pick the constant p strictly greater than 0.5. In that way, 
it is guaranteed that the rank fitness is a monotonicly 
decreasing function of the rank, since each individual will 
consume, in turn, at least half of the remaining probability.

	STATUS	DEPT.	OFFICE SIZE	RECYCLING BIN?
1.	faculty	ee	large	no
2.	staff	ee	small	no
3.	faculty	cs	medium	yes
4.	student	ee	large	yes
5.	staff	cs	medium	no
6.	faculty	cs	large	yes
7.	student	ee	small	yes
8.	staff	cs	medium	no

CS4341 Introduction to Artificial Intelligence. C2000 Solutions - Exam 2 - February 29, 2000

By Prof. Carolina Ruiz Department of Computer Science Worcester Polytechnic Institute

Instructions

NAME: Carolina Ruiz _______________________________________

Problem I. Constraint Satisfaction (20 points)

Drugs that must NOT be prescribed together

Drugs that must be prescribed together

NAME: Carolina Ruiz _______________________________________

Problem II. Logic-Based Systems (20 points)

NAME: Carolina Ruiz _______________________________________

Problem III. Learning - Decision Trees (20 points)

NAME: Carolina Ruiz _______________________________________

Problem IV. Learning - Neural Networks (20 points)

NAME: Carolina Ruiz _______________________________________

Problem V. Learning - Genetic Algorithms (20 points)

CS4341 Introduction to Artificial Intelligence. C2000
Solutions - Exam 2 - February 29, 2000

By Prof. Carolina Ruiz
Department of Computer Science
Worcester Polytechnic Institute