CS4341 Introduction to Artificial Intelligence. C2000
Solutions - Exam 2 - February 29, 2000

By Prof. Carolina Ruiz
Department of Computer Science
Worcester Polytechnic Institute

Instructions


NAME: Carolina Ruiz _______________________________________

Problem I. Constraint Satisfaction (20 points)

Suppose that a medical doctor needs your help constructing a constraint net to find prescriptions for his/her flu patients. The following table shows the symptoms exhibited by flu patients and the list of medications that can be prescribed for each symptom.
Symptom                Medications

Headache               H1, H2, H3
Runny nose             R1, R2
Fever                  F1, F2
Sore throat            S1, S2
Assume also that the FDA has found that each the following sets of medications presents dangerous drug interactions and so drugs in each set should not be prescribed together.
Drugs that must NOT be prescribed together
On the other hand, the following sets of drugs work in conjunction and so if one of them is prescribed then the other one must be prescribed as well.
Drugs that must be prescribed together
Answer the following questions:
  1. (10 points) Construct a constraint net that helps the medical doctor in finding appropriate prescriptions for his/her flu patients. Provide the constraint matrix associated to each link in the net explicitly.
    
       ----------         M2         ------------
      | Headache |------------------| Runny Nose |
       ----------                    ------------
           |
        M1 |
           |
           |
      -------------                    -------
     | Sore Throat |------------------| Fever |
      -------------        M3          -------
    
    
     M1 |  S1  S2      M2 |  R1  R2     M3 |  F1  F2
     -------------     -------------    -------------
     H1 |  0   0       H1 |  1   1      S1 |  0   1
     H2 |  0   1       H2 |  1   0      S2 |  1   0
     H3 |  1   0       H3 |  1   1
    
    
    

  2. (10 points) A patient may be allergic to some drugs. Hence, an appropriate prescription for a flu patient should consist of 4 medications, one for each symptom, satisfying the constraints above and to which the patient is not allergic. Find an appropriate prescription for Lucy, who is allergic to F1 and to R2. Describe in detail each of the steps of your problem solving strategy:

    1. Arc Consistency Check
      During this stage of the process, arcs are revised to eliminate
      values that will no longer be valid due to Lucy's allergies.
      We revise below the domain of each of the variables:
      
      
      • Fever: Reduced domain = {F2} F1 is eliminated because Lucy is allergic to it.

      • Runny Nose: Reduced domain = {R1} R2 is eliminated because Lucy is allergic to it.

      • Sore Throat: Reduced domain = {S1} S2 is eliminated because according to the constraint matrix M3 the only medication for sore throat that can be prescribed in conjunction with F2 is S1.

      • Headache: Reduced domain = {H3} H1 and H2 are eliminated because according to the constraint matrix M1 the only medication for headache that can be prescribed in conjunction with S1 is H3. Note that H1 would be eliminated anyway because all entries in M1 corresponding to H1 are 0.

    2. Depth First Search
      1. Selection of the 1st node
      2. Selection of each consecutive node
      The search is quite simple since the reduced domain of each 
      variable contains just one value. The only possible solution 
      is {H3,R1,F2,S1}. Selecting the variables in any order will 
      lead to the same valid solution in the same amount of time.
      

NAME: Carolina Ruiz _______________________________________

Problem II. Logic-Based Systems (20 points)

Consider the following set of axioms:
  1. forall x [equal(x,x)]

  2. forall y,z [equal(y,z) -> equal(z,y)]

  3. forall w,s,t [equal(w,s) and equal (s,t) -> equal(w,t)]

  4. equal(b,a)

  5. equal(b,c)
And the conclusion: As usual, a, b, and c denote constants; and x, y, z, w, s, and t denote variables.

Prove the conclusion from the axioms by refutation using resolution:


NAME: Carolina Ruiz _______________________________________

Problem III. Learning - Decision Trees (20 points)

The following table contains training examples that help a robot janitor predict whether or not an office contains a recycling bin.

STATUS DEPT. OFFICE SIZE RECYCLING BIN?
1. faculty ee large no
2. staff ee small no
3. faculty cs medium yes
4. student ee large yes
5. staff cs medium no
6. faculty cs large yes
7. student ee small yes
8. staff cs medium no

  1. (15 points) Use information theory to construct a minimal decision tree that predicts whether or not an office contains a recycling bin. Show each step of the computation.
                x      1     2/3    1/3    1/2
         ---------------------------------------
          log_2(x)     0    -0.6   -1.6    -1
    
    
    
    • Selecting the attribute for the root node: YES NO
      • Status:
        • Faculty: (3/8)*[ -(2/3)*log_2(2/3) - (1/3)*log_2(1/3)]
        • Staff: + (3/8)*[ -(0/3)*log_2(0/3) - (3/3)*log_2(3/3)]
        • Student: + (2/8)*[ -(2/2)*log_2(2/2) - (0/2)*log_2(0/2)]
        • TOTAL: = 0.35 + 0 + 0 = 0.35
      • Size:
        • Large: (3/8)*[ -(2/3)*log_2(2/3) - (1/3)*log_2(1/3)]
        • Medium: + (3/8)*[ -(1/3)*log_2(1/3) - (2/3)*log_2(2/3)]
        • Small: + (2/8)*[ -(1/2)*log_2(1/2) - (1/2)*log_2(1/2)]
        • TOTAL: = 0.35 + 0.35 + 2/8 = 0.95
      • Dept:
        • ee: (4/8)*[ -(2/4)*log_2(2/4) - (2/4)*log_2(2/4)]
        • cs: + (4/8)*[ -(2/4)*log_2(2/4) - (2/4)*log_2(2/4)]
        • TOTAL: = 0.5 + 0.5 = 1
      Since status is the attribute with the lowest entropy, it is selected as the root node: Status / | \ Faculty / | staff \ student / | \ 1-,3+,6+ 2-,5-,8- 4+,7+ BIN=? BIN=NO BIN=YES Only the branch corresponding to Status=Faculty needs further processing.

    • Selecting the attribute to split the branch Status=Faculty: YES NO
      • Dept:
      • ee: (1/3)*[ -(0/1)*log_2(0/1) - (1/1)*log_2(1/1)]
      • cs: + (2/3)*[ -(2/2)*log_2(2/2) - (0/2)*log_2(0/2)]
      • TOTAL: = 0 + 0 = 0
      Since the minimum possible entropy is 0 and dept has that minimum possible entropy, it is selected as the best attribute to split the branch Status=Faculty. Note that we don't even have to calculate the entropy of the attribute size since it cannot possibly be better than 0. Status / | \ Faculty / | staff \ student / | \ 1-,3+,6+ 2-,5-,8- 4+,7+ Dept BIN=NO BIN=YES / \ ee / \ cs / \ 1- 3+,6+ BIN=NO BIN=YES
    Each branch ends in a homogeneous set of examples so the construction of the decision tree ends here.
  2. (5 points) Translate your decision tree into a collection of decision rules.
    IF status=faculty AND dept=ee THEN recycling-bin=no
    
    IF status=faculty AND dept=cs THEN recycling-bin=yes
    
    IF status=staff THEN recycling-bin=no
    
    IF status=student THEN recycling-bin=yes
    

NAME: Carolina Ruiz _______________________________________

Problem IV. Learning - Neural Networks (20 points)

Consider the following function of two variables:
        A  B  Desired Output
        0  0    1
        1  0    0
        0  1    0
        1  1    1
  1. (5 points) Prove that this function cannot be learned by a single perceptron that uses the step function as its activation function.
    This function cannot be computed using a single perceptron since
    the function is not linearly separable:
    
          B
           ^
           |
           0    1
           |
           |
        ---1----0-------->
           |              A
    
  2. (15 points) Explain how you would train a neural network (in which each unit uses the sigmoid function as its activation function) to learn this function by describing in detail each of the steps below:

NAME: Carolina Ruiz _______________________________________

Problem V. Learning - Genetic Algorithms (20 points)

Consider a genetic algorithm in which the rank method is used to select candidates. Recall that, given a fixed value p between 0 and 1, the probability of picking the first-ranked candidate is p; the probability of picking the second-ranked candidate is p times the remaining probability (1-p), that is p*(1-p); the probability of picking the third-ranked candidate is p times the remaining probability (1-(p + p*(1-p))), that is p*(1-(p + p*(1-p))) = p*(1-p)(1-p); and so on, until only one candidate is left, in which case this candidate receives the remaining probability, so that the probabilities of all the individuals in the population sum up to 1.
  1. (7 points) Suppose that you have five candidates. You decide to select one using the rank-space method with p=0.25. Compute the probabilities for each of the five as a function of rank.
      
        Individuals     Rank      Rank Fitness
        -----------------------------------------
            C1           1st         0.25
    
            C2           2nd         0.1875    = 0.25*(1-0.25)
    
            C3           3rd         0.140625  = 0.25*(1-0.4375)
    
            C4           4th         0.1054687 = 0.25*(1-0.578125) 
    
            C5           5th         0.3164063 = 1-0.6835937
        
    
    Note that:
    
    • The sum of the rank fitnesses of C1 + C2 = 0.4375
    • The sum of the rank fitnesses of C1 + C2 + C3 = 0.578125
    • The sum of the rank fitnesses of C1 + C2 + C3 + C4 = 0.6835937
    • The sum of the rank fitnesses of C1 + C2 + C3 + C4 + C5 = 1
  2. (7 points) What is peculiar about the rank fitnesses that you computed in the previous part?
    That the probability of selecting the least fit element (i.e. C5)
    is greater than the probabilities of selecting any of the other 
    (fitter) individuals.
    
  3. (6 points) Can you avoid the peculiarity you observed in part 2 by restricting the probability p to a particular range? Your answer should be independent of the number of individuals in the population.
    Pick the constant p strictly greater than 0.5. In that way, 
    it is guaranteed that the rank fitness is a monotonicly 
    decreasing function of the rank, since each individual will 
    consume, in turn, at least half of the remaining probability.