CS 4341 D01 - Homework Solutions

WPI Worcester Polytechnic Institute

Computer Science Department
------------------------------------------

CS4341 Introduction to Artificial Intelligence 
SOLUTIONS Homework - D 2001

By Songting Chen and Carolina Ruiz 

This homework assignment is OPTIONAL. It will be counted as 10% of extra credit towards your final grade and is intended as practice for the final exam. 
------------------------------------------

Instructions


  • Homework Problems: This homework consists of six problems:

    Problem 1. Decision Trees (20 points)

    The following table contains training examples that help predict whether a patient is likely to have a heart attack.

    PATIENT ID CHEST PAIN? MALE? SMOKES? EXERCISES? HEART ATTACK?
    1. yes yes no yes yes
    2. yes yes yes no yes
    3. no no yes no yes
    4. no yes no yes no
    5. yes no yes yes yes
    6. no yes yes yes no

    1. (15 points) Use information theory to construct a minimal decision tree that predicts whether or not a patient is likely to have a heart attack. SHOW EACH STEP OF THE COMPUTATION.
                  x      1  1/4  2/3  3/4  1/3  1/2
           ---------------------------------------
            log_2(x)     0   -2 -0.6 -0.4 -1.6  -1
      
      
      • Selecting the attribute for the root node: Heart Attack YES NO
        • Chest Pain:
          • Yes: (3/6)*[ -(3/3)*log_2(3/3) - (0/3)*log_2(0/3)]
          • No: (3/6)*[ -(1/3)*log_2(1/3) - (2/3)*log_2(2/3)]
          • TOTAL: = 0 + 0.47 = 0.47
        • Male:
          • Yes: (4/6)*[ -(2/4)*log_2(2/4) - (2/4)*log_2(2/4)]
          • No: (2/6)*[ -(2/2)*log_2(2/2) - (0/2)*log_2(0/2)]
          • TOTAL: = 0.667 + 0 = 0.667
        • Smokes:
          • Yes: (4/6)*[ -(3/4)*log_2(3/4) - (1/4)*log_2(1/4)]
          • No: (2/6)*[ -(1/2)*log_2(1/2) - (1/2)*log_2(1/2)]
          • TOTAL: = 0.53 + 0.33 = 0.86
        • Exercises:
          • Yes: (4/6)*[ -(2/4)*log_2(2/4) - (2/4)*log_2(2/4)]
          • No: (2/6)*[ -(2/2)*log_2(2/2) - (0/2)*log_2(0/2)]
          • TOTAL: = 0.667 + 0 = 0.667
        Since Chest Pain is the attribute with the lowest entropy, it is selected as the root node: Chest Pain / \ Yes / \ No / \ 1+,2+,5+ 3+,4-,6- HA=YES HA=? Only the branch corresponding to Chest Pain = No needs further processing.

      • Selecting the attribute to split the branch Chest Pain = No: Only Examples 3,4,6 need to be taken into consideration. YES NO
        • Male:
        • Yes: (2/3)*[ -(0/2)*log_2(0/2) - (2/2)*log_2(2/2)]
        • No: (1/3)*[ -(1/1)*log_2(1/1) - (0/1)*log_2(0/1)]
        • TOTAL: = 0 + 0 = 0
        Since the minimum possible entropy is 0 and Male has that minimum possible entropy, it is selected as the best attribute to split the branch Chest Pain=No. Note that we don't even have to calculate the entropy of other two attributes, i.e., Smokes,Exercises, since they cannot possibly be better than 0. BTW, note that the entropy of Exercises is also 0 over examples 3, 4, 6 and so it could also have been used to split this node. Chest Pain / \ Yes / \ No / \ 1+,2+,5+ 3+,4-,6- HA=YES Male / \ Yes / \ No / \ 4-,6- 3+ HA=NO HA=YES
      Each branch ends in a homogeneous set of examples so the construction of the decision tree ends here.
    2. (5 points) Translate your decision tree into a collection of decision rules.
      IF Chest Pain=Yes THEN Heart Attack = Yes
      
      IF Chest Pain=No AND Male=Yes THEN Heart Attack = No
      
      IF Chest Pain=No AND Male=No THEN Heart Attack = Yes
      
      
      
      

    Problem 2. Genetic Algorithms (20 points)

    Consider a genetic algorithm in which individuals are represented a triples (x,y,z), where x, y, and z are non-negative real numbers. Let the fitness function be given by f(x,y,z) = (x*x) + (y*y) + z.

    Complete the following table showing the probabilities of selecting each of the individuals below according to the different selection methods. For the diversity method, suppose that the first individual in the table (4,7,2) has been selected. Compute the diversity rank for all of the remaining individuals with respect to that first individual. The distance between two individuals will be given by the Euclidean distance between them, i.e. d((x,y,z),(w,s,t)) = sqrt((x-w)^2 + (y-s)^2 + (z-t)^2), where a^2 means a to the 2nd power, or in other words a*a.

    NOTE: Following Winston's notation, below I'm calling "fitness" (in quotes) the probability that an indiviual has of being selected under a given selection method.

    SHOW EACH STEP OF YOUR WORK.
    Individuals 
     
     
    Fitness 
    or Quality 
    Rank 
     
     
    Standard 
    "fitness" 
    Rank 
    "fitness" 
    p=0.55 

    ---- 
    d^2 
    Diversity 
    rank 
    Diversity 
    + Quality 
    "fitness" 
     
     
    (4, 7, 2)  67  0.52  0.55  inf.  0.041 
    (2, 5, 3)  32  0.25  0.248  0.111  0.05 
    (0, 2, 7)  11  0.085  0.05  0.0152  0.248 
    (0, 4, 3)  19  0.147  0.111  0.0385  0.111 
    (0, 0, 0)  0.041  0.0145  0.55 

    Calculation: 
    1.Quality of (4,7,2) = 4^2 + 7^2 + 2 = 67; Rank 1
       Quality of (2,5,3) = 2^2 + 5^2 + 3 = 32; Rank 2 
       Quality of (0,2,7) = 0^2 + 2^2 + 7 = 11; Rank 4
       Quality of (0,4,3) = 0^2 + 4^2 + 3 = 19; Rank 3
       Quality of (0,0,0) = 0^2 + 0^2 + 0 = 0; Rank 5
    
    2.Total = 11+ 32 + 67 + 19 + 0=129;
       Standard "fitness" of (4,7,2) = 67/129 = 0.52;
       Standard "fitness" of (2,5,3) = 32/129 = 0.25;
       Standard "fitness" of (0,2,7) = 11/129 = 0.085;
       Standard "fitness" of (0,4,3) = 19/129 = 0.147;
       Standard "fitness" of (4,7,2) = 0/129 = 0;
       
    3.Rank "fitness" of (4,7,2) = p = 0.55
       Rank "fitness" of (2,5,3) = p(1-p) = 0.55*0.45 = 0.25
       Rank "fitness" of (0,2,7) = p(1-p)(1-p)(1-p) = 0.05
       Rank "fitness" of (0,4,3) = p(1-p)(1-p) = 0.111
       Rank "fitness" of (0,0,0) = 1 - all of above = 0.041
    
    4.1/d^2 of (4,7,2) = 1 / (0) = inf. Rank 5
       1/d^2 of (2,5,3) = 1 / (4+4+1) = 0.111 Rank 4
       1/d^2 of (0,2,7) = 1 / (16+25+25) = 0.0152 Rank 2
       1/d^2 of (0,4,3) = 1 / (16+9+1) = 0.0385 Rank 3
       1/d^2 of (0,0,0) = 1 / (16+49+4) = 0.0145 Rank 1
       
    5.Diversity + Quality "fitness" is based on the sum of quality rank and 
       diversity rank, using diversity rank to break the ties.
                               Summed/Diversity Rank    D+Q "fitness" 
       (4,7,2)                    6/5                                      0.041
       (2,5,3)                    6/4                                      0.05
       (0,2,7)                    6/2                                      0.248
       (0,4,3)                    6/3                                      0.111
       (0,0,0)                    6/1                                      0.55
    

    Problem 3. Neural Networks (20 points)

    Consider the table containing heart attack data from problem 1. Suppose that "yes" is represented by the number 1 and "no" is represented by the number "0". The purpose of this problem is to design a neural network and to describe the steps that would be needed to train it so that it predicts the whether or not a patient is likely to have a heart attack. (Note that you don't need to run the process on a machine, just to describe in words how the process would take place).

    Answer the following question and explain your answers:

    1. (5 points) Neural net topology:
      • How many layers would you use? Why?
             
        Only one hidden layer in addition to the input layer and the output layer.
        BTW, only one hidden layer is needed since this is a boolean function and one hidden layer is 
        sufficient to represent any boolean function (see Handouts). 
        
      • How many nodes per layers would you use? Why?
             
        I'll use just two hidden units since the function seems 
        simple enough. In general much fewer hidden nodes than input nodes are needed.
        Clearly one output node is enough to represent the binary output.
        
      • Draw a picture of your neural net.

    2. (15 points) Training of the neural net:
      • What does the error backpropagation algorithm search for?
        It searches for the connections' weights that minimize the 
        error function (or in other words, maximize the accuracy) of the network.
        
        
      • How would you initialize the weights of your net's connections?
        
        I would select small (i.e. close to 0, positive or negative), random values. 
        
        
      • Carefully describe in words the steps performed by the error backpropagation algorithm during one epoch.
        The algorithm will perform the following set of steps during one epoch and will iterate them over subsequent epochs:
        
        • Begin of Epoch:
        • For each of the training examples:
          • Propagate the inputs forward through the net.
          • Compute the output of each node (= the sigmoid function applied to the weighted sum of the inputs to the node) first for the hidden layer and then for the output layer.
          • Compute the errors (i.e. the betas) associated to each node, starting from the output layer and moving back to the hidden layer.
          • Compute and store the weight changes (i.e. the deltas) associated with the current training example for each of the connections in the net.
        • End of Epoch
        • After processing all examples, update the weight of each connection by assigning to it the sum of that weight's delta for each of the examples plus the current value of the weight.
        until we obtain "satisfactory weights". (See the answer to the following question for more precise stopping criteria).
      • What stopping condition would you use?
        
        The following are possible stopping criteria:
        
        • When the updated weights after an iteration of the error back propagation procedure are almost identical to the weights before that iteration. This is in general the best stopping criterion.
        • When the output of the net is very close to the desired one.
        • When the number of iterations exceeds a pre-defined threshold. Say the procedure has run for 20,000 iterations and it doesn't seem to be converging.
        • When the output error seems to be increasing instead of decreasing.
        • When the classification accuracy is decreasing over the validation data (signaling overfitting).

    Problem 4. Logic-based Systems (20 points)

    Consider the following set of axioms:
    1. forall x [sum(x,0,x)]

    2. forall x [forall y [forall z [sum(x,y,z) -> sum(x,s(y),s(z))]]]

    3. forall t [exists y [sum(t,y,0)]]
    And the conclusion: As usual, 0 denotes a constant; x, y, z, w, and t denote variables; the relation sum(x,y,z) can be read as x+y=z, and s denotes a function symbol, that can be interpreted as the "successor" function (i.e. the increment by 1 function).

    Prove the conclusion from the axioms by refutation using resolution:

    
    
    
  • Then we eliminate universal quantifiers (obtaining clausal forms without further processing):
    1. sum(x,0,x)
    2. !sum(x,y,z) || sum(x,s(y),s(z))
    3. sum(t,f(t),0)
    4. (negated conclusion:) !sum(a,w,s(0))
    (Note that we are skipping the normalization of variables step.)
  • (10 points) Apply resolution (state explicitly which substitutions are made) until the empty clause is found.
    
    2. !sum(x,y,z) || sum(x,s(y),s(z))
    3.  sum(t,f(t),0)
    ---------------------------------------------
    5.  sum(t,s(f(t)),s(0))    substituting x by t, y by f(t) and z by 0.
    
    5.  sum(t,s(f(t)),s(0))
    4. !sum(a,w,s(0))
    ---------------------------------------------
        EMPTY CLAUSE    substituting t by a, and w by s(f(t)).
    
    

    Problem 5. Planning (20 points)

    Consider the following planning problem. Fred is in his living room with his robot by his side and his beer is in the kitchen. The target is to have the beer with him in the living room. More precisely, (20 points) Follow in detail the backward chaining algorithm described in Chapter 15 to construct a plan to go from the initial situation to the goal situation. Show establishes, threatens, and before links.


    Problem 6. Machine Vision (20 points)

    Solve problem 26.1 from your textbook. See pages 682-683. Explain your answer. Please note that there is a typo on the first table in that problem (page 682). It should be B instead of C and C instead of B (that is, B and C should be swapped).
    
    By x3 = alpha*x1 + beta*x2, we can write down following equations:
    
    x = alpha*(-1)+beta*(-1.48)  -- (1)
    y = alpha*3+beta*2.38  -- (2)
    
    Solve the above two equations, we get:
    
    alpha = x - (3x+y)*0.72 -- (3)
    beta = - (3x+y)/2.06 -- (4)
    
    For object 1, we input the its x,y value to (3) and (4),  we get alpha = -1.584 , beta = 2.665. 
    We then test the alpha and beta on z.
    However the estimated Zc = alpha*0 + beta*0.26 = 0.69 != 0, which is the actual C. 
    
    For object 2,  we input the its x,y value to (3) and (4), we get alpha = 0.29 , beta = 1.451.
    We then test the alpha and beta on z.
    However, the estimated Zc = alpha*0 + beta*0.26 = 0.377 != 0.49, which is the actual C.
    
    For object 3,  we input the its x,y value to (3) and (4), we get alpha = -1.01, beta = 1.937
    We then test the alpha and beta on z.
    The estimated Zc = alpha*0 + beta*0.26 = 0.50 which is closest to the actual C, i.e., 0.49.
    
    So Object 3 is the same model.