Incorporating language restrictions

$next$ $up$ $previous$
Next: Formulation as a graph Up: Crash course in optical Previous: Using only image data

Incorporating language restrictions

A slightly better model uses transition probabilities to represent letter-order restrictions. For each pair (x,y) of letters, a number P(y|x) between and 1 is given, corresponding to the fraction of words in which letter y is observed to follow immediately after letter x. We assume that the prior probabilities of all letters are also known; these are numbers between and 1 that measure the relative frequencies of the letters in the given language (English, for example). These probabilities allow one to assess the likelihood of a given sequence of letters in the absence of image data. The likelihood of observing the sequence $L_1,\ L_2,\ \cdots L_n$ can be computed as a product:

$\begin{displaymath} P(L_1,\ L_2,\ \cdots L_n) = P(L_1 \vert \text{blank}) P(L_2 \vert L_1) \cdots P(L_n \vert L_{n-1}) P(L_n \vert \text{blank})\end{displaymath}$

The language generation model encoded in the transition probabilities can now be combined with the image generation model implicit in the conditional probabilities P(I_k | L_k) as described previously to quantify the likelihood of a given sequence of letters in the presence of specific image data. Indeed, Eq. $[*]$ becomes:

$\begin{displaymath} P(L_1,\ \cdots L_n \vert I_1,\ \cdots I_n ) = C \cdot P(I_1 ... ...vert L_1) \cdots P(L_n \vert L_{n-1}) P(L_n \vert \text{blank})\end{displaymath}$

The text recognition problem therefore reduces to the following optimization problem: maximize the likelihood that appears on the right-hand side of the above equation over all letter sequences $L_1 \cdots L_n$ . The winning letter sequence is the one most likely to have led to the observed images, given the known transition probabilities between letters.

$next$ $up$ $previous$
Next: Formulation as a graph Up: Crash course in optical Previous: Using only image data

Sergio A. Alvarez
4/26/2000