|
From: | Antonio Diaz Diaz |
Subject: | Re: [Bug-ocrad] OCR of messy computer printout |
Date: | Wed, 07 Sep 2011 18:41:44 +0200 |
User-agent: | Mozilla/5.0 (X11; U; Linux i586; en-US; rv:1.7.11) Gecko/20050905 |
Hello Karl, Karl Berry wrote:
I tried playing with the --thresheld, but the results will not noticeably better. Are there other options or approaches which might help?
The image is not only not-very-cleanly scanned, it is also not-very-well printed. I guess it was printed with a real teletype. See the way a lot of zeros are printed in this image:
· O O · · · O O O O O · · · · O O · · · · · O · · · · · O O · · + · O O · · · · O · · · · · O · · · · · O · O · · O O · · O O O · ·The resolution of the scan is also too small. Ocrad needs a character height of 20 pixels for a good result, but as you can see, the zero above is only 11 pixels high. Using --scale=2 improves the result, but not much.
I think ocrad can't extract anything useful from this image. I think it will be difficult even for a human to retype it without errors.
Best regards, Antonio.
[Prev in Thread] | Current Thread | [Next in Thread] |