guix-patches
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[bug#61851] [PATCH] gnu: tesseract-ocr-tessdata-fast: Install tesseract


From: Simon South
Subject: [bug#61851] [PATCH] gnu: tesseract-ocr-tessdata-fast: Install tesseract config files.
Date: Mon, 27 Feb 2023 17:43:43 -0500
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/28.2 (gnu/linux)

Jelle,

Respectfully, and speaking only as an interested observer, I think this
may not be the right fix.

Guix's Tesseract is indeed missing its config files, causing (among
other things) the examples in the online documentation[0] to not work,
e.g.:

  ssouth@hamlet ~/tesseract-ocr-test [env]$ tesseract images/eurotext.png - -l 
eng hocr
  read_params_file: Can't open hocr
  The (quick) [brown] {fox} jumps!
  Over the $43,456.78 <lazy> #90 dog
  (...)

But the root issue appears to be a misconfiguration of the
TESSDATA_PREFIX search path in the tessdata-ocr package, which causes
Tesseract's own config files to be installed in a folder other than the
one it's configured to search.

Fixing this places Tesseract's config files and the trained-data files
together beneath /usr/share/tessdata, allowing Tesseract to work as
expected:

  ssouth@hamlet ~/tesseract-ocr-test [env]$ tesseract images/eurotext.png - -l 
eng hocr
  <?xml version="1.0" encoding="UTF-8"?>
  <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
  (...)

This approach has the advantage of keeping the
tesseract-ocr-tessdata-fast package "pure" and focused only on
trained-data files, which will be important for the patch I'm working on
that will split it into multiple packages, one for each language and
script, to allow greater flexibility.

I'll respond to this email with a draft (!) patch to tesseract-ocr that
should achieve the same result as yours, making the config files
available for use.  Does this also fix the problem for you?  If so,
would you consider submitting this change instead?

-- 
Simon South
simon@simonsouth.net

[0] https://tesseract-ocr.github.io/tessdoc/Command-Line-Usage.html





reply via email to

[Prev in Thread] Current Thread [Next in Thread]