[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[bug#57151] [PATCH 1/2] gnu: Add tesseract-ocr-tessdata-fast.
From: |
Simon South |
Subject: |
[bug#57151] [PATCH 1/2] gnu: Add tesseract-ocr-tessdata-fast. |
Date: |
Fri, 12 Aug 2022 07:27:35 -0400 |
User-agent: |
Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux) |
Maxim Cournoyer <maxim.cournoyer@gmail.com> writes:
> * gnu/packages/ocr.scm (tesseract-ocr-tessdata-fast): New variable.
Maxim,
Would it not be better to generate a separate package for each of the
languages and scripts this data covers, as is done by Debian for
instance? The entire dataset is about a gigabyte in size and supports
more than a hundred languages yet I imagine most people would be using
only one or two.
This would mean tesseract-ocr could simply propagate the
"tesseract-ocr-tessdata-fast-eng" package rather than cherry-picking a
specific file, and would establish a convention that would be necessary
for packaging the "best" dataset as well, if that's desired.
(Thanks for working on this; it's been on my to-do list for a while as
well.)
--
Simon South
simon@simonsouth.net