CS 4341 D01 - Homework Solutions

Computer Science Department

CS4341 Introduction to Artificial Intelligence
SOLUTIONS Homework - D 2001

By Songting Chen and Carolina Ruiz

This homework assignment is OPTIONAL. It will be counted as 10% of extra credit towards your final grade and is intended as practice for the final exam.

Instructions

Show your work
Justify your answers
Use the space provided to write your answers
Ask in case of doubt
This is an INDIVIDUAL homework.
This homework assignment is OPTIONAL. It is worth 10% of extra credit towards your final numeric score.
The deadline for this OPTIONAL homework is Friday, April 27, 2001 at 5 pm. Please turn in a HARD COPY of your homework assignment by this deadline.

Homework Problems: This homework consists of six problems:

Problem 1: Decision Trees (20 points)
Problem 2: Genetic Algorithms (20 points)
Problem 3: Neural Networks (20 points)
Problem 4: Logic-based Systems (20 points)
Problem 5: Planning (20 points)
Problem 6: Machine Vision (20 points).

Problem 1. Decision Trees (20 points)

The following table contains training examples that help predict whether a patient is likely to have a heart attack.

PATIENT ID CHEST PAIN? MALE? SMOKES? EXERCISES? HEART ATTACK?
1. yes yes no yes yes
2. yes yes yes no yes
3. no no yes no yes
4. no yes no yes no
5. yes no yes yes yes
6. no yes yes yes no

(15 points) Use information theory to construct a minimal decision tree that predicts whether or not a patient is likely to have a heart attack. SHOW EACH STEP OF THE COMPUTATION.

            x      1  1/4  2/3  3/4  1/3  1/2
     ---------------------------------------
      log_2(x)     0   -2 -0.6 -0.4 -1.6  -1


 Selecting the attribute for the root node:

Heart Attack                    YES                   NO


Chest Pain:

 Yes:    (3/6)*[ -(3/3)*log_2(3/3) - (0/3)*log_2(0/3)]
 No:     (3/6)*[ -(1/3)*log_2(1/3) - (2/3)*log_2(2/3)]
 TOTAL:    = 0 + 0.47  = 0.47

Male:

 Yes:    (4/6)*[ -(2/4)*log_2(2/4) - (2/4)*log_2(2/4)]
 No:     (2/6)*[ -(2/2)*log_2(2/2) - (0/2)*log_2(0/2)]
 TOTAL:    = 0.667 + 0 = 0.667  

Smokes:

 Yes:    (4/6)*[ -(3/4)*log_2(3/4) - (1/4)*log_2(1/4)]
 No:     (2/6)*[ -(1/2)*log_2(1/2) - (1/2)*log_2(1/2)]
 TOTAL:    = 0.53 + 0.33 = 0.86

Exercises:

 Yes:    (4/6)*[ -(2/4)*log_2(2/4) - (2/4)*log_2(2/4)]
 No:     (2/6)*[ -(2/2)*log_2(2/2) - (0/2)*log_2(0/2)]
 TOTAL:    = 0.667 + 0 = 0.667



Since Chest Pain is the attribute with the lowest entropy,
it is selected as the root node:

                      
                   Chest Pain
                 /           \
           Yes  /             \  No
               /               \
         1+,2+,5+             3+,4-,6-
          HA=YES                HA=?


Only the branch corresponding to Chest Pain = No needs further 
processing. 


 Selecting the attribute to split the branch Chest Pain = No: 
Only Examples 3,4,6 need to be taken into consideration.

                             YES                   NO

Male:
 Yes:        (2/3)*[ -(0/2)*log_2(0/2) - (2/2)*log_2(2/2)]
 No:         (1/3)*[ -(1/1)*log_2(1/1) - (0/1)*log_2(0/1)]
 TOTAL:    = 0 + 0 = 0


Since the minimum possible entropy is 0 and Male has that
minimum possible entropy, it is selected as the best attribute
to split the branch Chest Pain=No.  Note that we don't even have
to calculate the entropy of other two attributes, i.e., 
Smokes,Exercises, since they cannot possibly be better than 0.
BTW, note that the entropy of Exercises is also 0 over examples 3, 4, 6
and so it could also have been used to split this node.
                      
                   Chest Pain
                 /           \
           Yes  /             \  No
               /               \
         1+,2+,5+             3+,4-,6-
          HA=YES                Male
	                        /    \ 
                           Yes /      \ No
                              /        \
                            4-,6-       3+
                           HA=NO     HA=YES
        


Each branch ends in a homogeneous set of examples so the
construction of the decision tree ends here.

(5 points) Translate your decision tree into a collection of decision rules.

IF Chest Pain=Yes THEN Heart Attack = Yes

IF Chest Pain=No AND Male=Yes THEN Heart Attack = No

IF Chest Pain=No AND Male=No THEN Heart Attack = Yes

Problem 2. Genetic Algorithms (20 points)

Consider a genetic algorithm in which individuals are represented a triples (x,y,z), where x, y, and z are non-negative real numbers. Let the fitness function be given by f(x,y,z) = (x*x) + (y*y) + z.

Complete the following table showing the probabilities of selecting each of the individuals below according to the different selection methods. For the diversity method, suppose that the first individual in the table (4,7,2) has been selected. Compute the diversity rank for all of the remaining individuals with respect to that first individual. The distance between two individuals will be given by the Euclidean distance between them, i.e. d((x,y,z),(w,s,t)) = sqrt((x-w)^2 + (y-s)^2 + (z-t)^2), where a^2 means a to the 2nd power, or in other words a*a.

NOTE: Following Winston's notation, below I'm calling "fitness" (in quotes) the probability that an indiviual has of being selected under a given selection method.

SHOW EACH STEP OF YOUR WORK.

Individuals

Fitness
or Quality Rank

Standard
"fitness" Rank
"fitness"
p=0.55 1
----
d^2 Diversity
rank Diversity
+ Quality
"fitness"

(4, 7, 2) 67 1 0.52 0.55 inf. 5 0.041

(2, 5, 3) 32 2 0.25 0.248 0.111 4 0.05

(0, 2, 7) 11 4 0.085 0.05 0.0152 2 0.248

(0, 4, 3) 19 3 0.147 0.111 0.0385 3 0.111

(0, 0, 0) 0 5 0 0.041 0.0145 1 0.55

Calculation: 
1.Quality of (4,7,2) = 4^2 + 7^2 + 2 = 67; Rank 1
   Quality of (2,5,3) = 2^2 + 5^2 + 3 = 32; Rank 2 
   Quality of (0,2,7) = 0^2 + 2^2 + 7 = 11; Rank 4
   Quality of (0,4,3) = 0^2 + 4^2 + 3 = 19; Rank 3
   Quality of (0,0,0) = 0^2 + 0^2 + 0 = 0; Rank 5

2.Total = 11+ 32 + 67 + 19 + 0=129;
   Standard "fitness" of (4,7,2) = 67/129 = 0.52;
   Standard "fitness" of (2,5,3) = 32/129 = 0.25;
   Standard "fitness" of (0,2,7) = 11/129 = 0.085;
   Standard "fitness" of (0,4,3) = 19/129 = 0.147;
   Standard "fitness" of (4,7,2) = 0/129 = 0;
   
3.Rank "fitness" of (4,7,2) = p = 0.55
   Rank "fitness" of (2,5,3) = p(1-p) = 0.55*0.45 = 0.25
   Rank "fitness" of (0,2,7) = p(1-p)(1-p)(1-p) = 0.05
   Rank "fitness" of (0,4,3) = p(1-p)(1-p) = 0.111
   Rank "fitness" of (0,0,0) = 1 - all of above = 0.041

4.1/d^2 of (4,7,2) = 1 / (0) = inf. Rank 5
   1/d^2 of (2,5,3) = 1 / (4+4+1) = 0.111 Rank 4
   1/d^2 of (0,2,7) = 1 / (16+25+25) = 0.0152 Rank 2
   1/d^2 of (0,4,3) = 1 / (16+9+1) = 0.0385 Rank 3
   1/d^2 of (0,0,0) = 1 / (16+49+4) = 0.0145 Rank 1
   
5.Diversity + Quality "fitness" is based on the sum of quality rank and 
   diversity rank, using diversity rank to break the ties.
                           Summed/Diversity Rank    D+Q "fitness" 
   (4,7,2)                    6/5                                      0.041
   (2,5,3)                    6/4                                      0.05
   (0,2,7)                    6/2                                      0.248
   (0,4,3)                    6/3                                      0.111
   (0,0,0)                    6/1                                      0.55

Problem 3. Neural Networks (20 points)

Consider the table containing heart attack data from problem 1. Suppose that "yes" is represented by the number 1 and "no" is represented by the number "0". The purpose of this problem is to design a neural network and to describe the steps that would be needed to train it so that it predicts the whether or not a patient is likely to have a heart attack. (Note that you don't need to run the process on a machine, just to describe in words how the process would take place).

Answer the following question and explain your answers:

(5 points) Neural net topology:

How many layers would you use? Why?

     
Only one hidden layer in addition to the input layer and the output layer.
BTW, only one hidden layer is needed since this is a boolean function and one hidden layer is 
sufficient to represent any boolean function (see Handouts).

How many nodes per layers would you use? Why?

     
I'll use just two hidden units since the function seems 
simple enough. In general much fewer hidden nodes than input nodes are needed.
Clearly one output node is enough to represent the binary output.

Draw a picture of your neural net.

(15 points) Training of the neural net:

What does the error backpropagation algorithm search for?

It searches for the connections' weights that minimize the 
error function (or in other words, maximize the accuracy) of the network.

How would you initialize the weights of your net's connections?


I would select small (i.e. close to 0, positive or negative), random values.

Carefully describe in words the steps performed by the error backpropagation algorithm during one epoch.

The algorithm will perform the following set of steps during one epoch and will iterate them over subsequent epochs:

Begin of Epoch: 
For each of the training examples:

Propagate the inputs forward through the net. 
Compute the output of each node (= the sigmoid 
function applied to the weighted sum of the inputs 
to the node) first for the hidden layer and then for 
the output layer.
Compute the errors (i.e. the betas) associated to 
each node, starting from the output layer and moving 
back to the hidden layer.
Compute and store the weight changes (i.e. the deltas) 
associated with the current training example for 
each of the connections in the net. 

 End of Epoch
After processing all examples, update the weight 
of each connection by assigning to it the sum of that 
weight's delta for each of the examples plus the current 
value of the weight.

until we obtain "satisfactory weights". (See the answer 
to the following question for more precise stopping 
criteria).

What stopping condition would you use?


The following are possible stopping criteria:

When the updated weights after an iteration of the 
error back propagation procedure are almost identical to
the weights before that iteration.
This is in general the best stopping criterion.
When the output of the net is very close to the
desired one. 
When the number of iterations exceeds a pre-defined
threshold. Say the procedure has run for 20,000 iterations
and it doesn't seem to be converging.
When the output error seems to be increasing instead
of decreasing.
When the classification accuracy is decreasing over the validation
data (signaling overfitting).

Problem 4. Logic-based Systems (20 points)

Consider the following set of axioms:

forall x [sum(x,0,x)]
forall x [forall y [forall z [sum(x,y,z) -> sum(x,s(y),s(z))]]]
forall t [exists y [sum(t,y,0)]]

And the conclusion:

forall x [exists w [sum(x,w,s(0))]]

As usual, 0 denotes a constant; x, y, z, w, and t denote variables; the relation sum(x,y,z) can be read as x+y=z, and s denotes a function symbol, that can be interpreted as the "successor" function (i.e. the increment by 1 function).

Prove the conclusion from the axioms by refutation using resolution:


 (10 points) Translate the axioms and the negation of the conclusion into clausal form.
We translate each of the axioms as follows:
The order below is very important - the steps of the transformation should
be followed in the precise order (see your textbook for that order):

 Negate the conclusion:

! forall x [exists w [sum(x,w,s(0))]] = exists x [forall w [!sum(x,w,s(0)]]

 We eliminate implications:

 forall x [sum(x,0,x)]

 forall x [forall y [forall z [!sum(x,y,z) || sum(x,s(y),s(z))]]]

 forall t [exists y [sum(t,y,0)]]

And the negation of the conclusion:

 exists x [forall w [!sum(x,w,s(0))]] 

 Then we eliminate existential quantifiers:


 forall x [sum(x,0,x)]

 forall x [forall y [forall z [!sum(x,y,z) || sum(x,s(y),s(z))]]]

 forall t [sum(t,f(t),0)] WHERE f IS A NEW FUNCTION SYMBOL 
                              NOT USED PREVIOUSLY IN THIS PROBLEM

 forall w [!sum(a,w,s(0))]  WHERE a IS A NEW CONSTANT SYMBOL 
                              NOT USED PREVIOUSLY IN THIS PROBLEM


Then we eliminate universal quantifiers (obtaining clausal forms
without further processing):
     
      sum(x,0,x)
     
 !sum(x,y,z) || sum(x,s(y),s(z))
     
 sum(t,f(t),0) 
     
 (negated conclusion:)  !sum(a,w,s(0))
     

(Note that we are skipping the normalization of variables step.)

(10 points) Apply resolution (state explicitly which substitutions are made) until the empty clause is found.


2. !sum(x,y,z) || sum(x,s(y),s(z))
3.  sum(t,f(t),0)
---------------------------------------------
5.  sum(t,s(f(t)),s(0))    substituting x by t, y by f(t) and z by 0.

5.  sum(t,s(f(t)),s(0))
4. !sum(a,w,s(0))
---------------------------------------------
    EMPTY CLAUSE    substituting t by a, and w by s(f(t)).

Problem 5. Planning (20 points)

Consider the following planning problem. Fred is in his living room with his robot by his side and his beer is in the kitchen. The target is to have the beer with him in the living room. More precisely,

Initial State
{at(robot,living_room), at(beer,kitchen),at(fred,living_room), door_closed(kitchen,living_room)}
Goal State
{at(robot,living_room),at(beer,living_room),at(fred,living_room), door_open(kitchen,living_room)}

Operators

Operator 1: open(R1,R2)

        IF at(robot,R1) 
           door_closed(R1,R2)
       ADD door_open(R1,R2)
    DELETE door_closed(R1,R2)

Operator 2: close(R1,R2)

        IF at(robot,R1) 
           door_open(R1,R2)
       ADD door_closed(R1,R2)
    DELETE door_open(R1,R2)

Operator 3: move(R1,R2)

        IF at(robot,R1) 
           door_open(R1,R2)
       ADD at(robot,R2)
    DELETE at(robot,R1)

Operator 4: carry(R1,R2,Obj)

        IF door_open(R1,R2) 
           at(robot,R1)
           at(Obj,R1)
       ADD at(robot,R2)
           at(Obj,R2)
    DELETE at(robot,R1)
           at(Obj,R1)

(20 points) Follow in detail the backward chaining algorithm described in Chapter 15 to construct a plan to go from the initial situation to the goal situation. Show establishes, threatens, and before links.

Problem 6. Machine Vision (20 points)

Solve problem 26.1 from your textbook. See pages 682-683. Explain your answer. Please note that there is a typo on the first table in that problem (page 682). It should be B instead of C and C instead of B (that is, B and C should be swapped).


By x3 = alpha*x1 + beta*x2, we can write down following equations:

x = alpha*(-1)+beta*(-1.48)  -- (1)
y = alpha*3+beta*2.38  -- (2)

Solve the above two equations, we get:

alpha = x - (3x+y)*0.72 -- (3)
beta = - (3x+y)/2.06 -- (4)

For object 1, we input the its x,y value to (3) and (4),  we get alpha = -1.584 , beta = 2.665. 
We then test the alpha and beta on z.
However the estimated Zc = alpha*0 + beta*0.26 = 0.69 != 0, which is the actual C. 

For object 2,  we input the its x,y value to (3) and (4), we get alpha = 0.29 , beta = 1.451.
We then test the alpha and beta on z.
However, the estimated Zc = alpha*0 + beta*0.26 = 0.377 != 0.49, which is the actual C.

For object 3,  we input the its x,y value to (3) and (4), we get alpha = -1.01, beta = 1.937
We then test the alpha and beta on z.
The estimated Zc = alpha*0 + beta*0.26 = 0.50 which is closest to the actual C, i.e., 0.49.

So Object 3 is the same model.

PATIENT ID	CHEST PAIN?	MALE?	SMOKES?	EXERCISES?	HEART ATTACK?
1.	yes	yes	no	yes	yes
2.	yes	yes	yes	no	yes
3.	no	no	yes	no	yes
4.	no	yes	no	yes	no
5.	yes	no	yes	yes	yes
6.	no	yes	yes	yes	no

Individuals	Fitness or Quality	Rank	Standard "fitness"	Rank "fitness" p=0.55	1 ---- d^2	Diversity rank	Diversity + Quality "fitness"


(4, 7, 2)	67	1	0.52	0.55	inf.	5	0.041
(2, 5, 3)	32	2	0.25	0.248	0.111	4	0.05
(0, 2, 7)	11	4	0.085	0.05	0.0152	2	0.248
(0, 4, 3)	19	3	0.147	0.111	0.0385	3	0.111
(0, 0, 0)	0	5	0	0.041	0.0145	1	0.55

CS4341 Introduction to Artificial Intelligence SOLUTIONS Homework - D 2001