next up previous
Next: Formulation as a graph Up: Crash course in optical Previous: Using only image data

Incorporating language restrictions

A slightly better model uses transition probabilities to represent letter-order restrictions. For each pair (x,y) of letters, a number P(y|x) between and 1 is given, corresponding to the fraction of words in which letter y is observed to follow immediately after letter x. We assume that the prior probabilities of all letters are also known; these are numbers between and 1 that measure the relative frequencies of the letters in the given language (English, for example). These probabilities allow one to assess the likelihood of a given sequence of letters in the absence of image data. The likelihood of observing the sequence $L_1,\ L_2,\ \cdots L_n$ can be computed as a product:

\begin{displaymath}
P(L_1,\ L_2,\ \cdots L_n)
=
P(L_1 \vert \text{blank}) P(L_2 \vert L_1) \cdots P(L_n \vert L_{n-1}) P(L_n \vert \text{blank})\end{displaymath}

The language generation model encoded in the transition probabilities can now be combined with the image generation model implicit in the conditional probabilities P(Ik | Lk) as described previously to quantify the likelihood of a given sequence of letters in the presence of specific image data. Indeed, Eq. [*] becomes:

\begin{displaymath}
P(L_1,\ \cdots L_n \vert I_1,\ \cdots I_n )
=
C
\cdot
P(I_1 ...
 ...vert L_1) \cdots P(L_n \vert L_{n-1}) P(L_n \vert \text{blank})\end{displaymath}

The text recognition problem therefore reduces to the following optimization problem: maximize the likelihood that appears on the right-hand side of the above equation over all letter sequences $L_1 \cdots L_n$. The winning letter sequence is the one most likely to have led to the observed images, given the known transition probabilities between letters.


next up previous
Next: Formulation as a graph Up: Crash course in optical Previous: Using only image data
Sergio A. Alvarez
4/26/2000