Using only image data

$next$ $up$ $previous$
Next: Incorporating language restrictions Up: Crash course in optical Previous: Probabilistic description

Using only image data

If one assumes that all letters occur independently of one another and with the same probability, then the letter co-occurrence probability $P(L_1,\ L_2,\ \cdots L_n)$ is the same for all word hypotheses, and so that term effectively drops out of the maximization problem for Eq. $[*]$ . The expression to be maximized is now just the image generation probability:

$\begin{displaymath} P( I_1,\ I_2,\ \cdots I_n \vert L_1,\ L_2,\ \cdots L_n )\end{displaymath}$

We will assume for simplicity that knowledge of the identity of a letter completely determines the probability that a given image will be generated for that letter, and that the images for different letters within a given word are generated independently of one another; image probabilities for various letters have to be learned by the system by examining multiple sample images for each letter. The above expression then simplifies:

$\begin{displaymath} P( I_1,\ I_2,\ \cdots I_n \vert L_1,\ L_2,\ \cdots L_n ) = P(I_1 \vert L_1) P(I_2 \vert L_2) \cdots P(I_n \vert L_n )\end{displaymath}$

It is now possible to break the maximization problem down into n subproblems: for each position in the sequence, one selects the letter that is most consistent with the image for that position; in other words, L_k should maximize the expression P(I_k | L_k). The resulting sequence of letters is then the ``winning'' word.

Models such as those described above use knowledge about image generation, and in particular about the consistency between a given image and the hypothesis that a particular letter may have given rise to that image. However, such models completely ignore the letter-order restrictions that exist in English; for instance, the sequence ``cmt'' could beat ``cat'', despite one's intuitive sense that there should be a strong syntactical preference in favor of the latter.

$next$ $up$ $previous$
Next: Incorporating language restrictions Up: Crash course in optical Previous: Probabilistic description

Sergio A. Alvarez
4/26/2000