pdf-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [pdf-devel] Proposal of API for the Encoded Text module


From: Leonard Rosenthol
Subject: RE: [pdf-devel] Proposal of API for the Encoded Text module
Date: Mon, 28 Jan 2008 05:28:57 -0800

Just one other thing to remember is that PDF Names are either a subset
of PDDocEncoding _OR_ they are valid UTF8 strings.  (See PDFRef 1.7,
3.2.4).

Leonard

-----Original Message-----
From: address@hidden
[mailto:address@hidden On Behalf Of
Aleksander Morgado
Sent: Monday, January 28, 2008 7:30 AM
To: address@hidden
Subject: [pdf-devel] Proposal of API for the Encoded Text module

Hi all,

Find attached my changes to the proposed API for the Encoded Text 
module. It's a diff to the gnupdf.texi file.

Some comments on the changes:

- Host encoding management will probably need a second round. We will 
need to clearly determine which OS don't have support for iconv 
(excluding Windows OSs, which have their specific way of handling host 
encodings).

- Regarding the issue of the `best encoding' to encode a given character

string, I really think that UTF-8 could be the default best encoding for

all of those OS supporting iconv (GNU/Linux, Unix, Mac OS X...) and even

for Windows OSs (AFAIK, UTF-8 is available in all modern Windows 
versions... should we give support for older versions?). UTF-8, in fact,

is one of the encoding conversions which will be built-in in the
library.

- Maybe we should really decide which will be the full list of supported

OS (and version of OS, if needed), and not think it during the 
development phase. This will help not only to determine the specific 
OS-dependent functions for host encoding support (determine which OS 
don't handle UTF-8, for example), but also to determine platforms with a

lack of some required feature (e.g. 64bit integers, discussed in another

thread). I could start a new page in the Wiki with this issue.

- All the functions involving encoding conversion (even those which 
create and initialize a new text object) return the status of the 
conversion, which should always be checked.

- I renamed functions involving PDF Strings 
(pdf_text_new_from_pdf_string) and PDF Doc Encoding 
(pdf_text_get_pdfdocenc, pdf_text_set_pdfdocenc), so that it is clear 
which one is being considered (PDF Strings can be in PDF Doc Encoding or

in UTF-16BE with BOM).

- pdf_text_concat won't work with text objects with different 
country/language code informations, so the returned status of this 
function should always be checked.

- In addition to the `best encoding', another function is given to get 
the default host encoding configured in the user's locale 
(pdf_text_get_host_encoding).

- And the last one, a new function is given to `initialize' the text 
module, which must be called at program startup, and is not thread-safe:

pdf_text_init. This function will be used to load the user's locale 
information.


Additional comments are welcome,

--
Aleksander




reply via email to

[Prev in Thread] Current Thread [Next in Thread]