|
From: | Aleksander Morgado |
Subject: | Re: [pdf-devel] Proposal of API for the Encoded Text module |
Date: | Mon, 28 Jan 2008 15:42:35 +0100 |
User-agent: | Thunderbird 2.0.0.9 (Macintosh/20071031) |
Just one other thing to remember is that PDF Names are either a subset of PDDocEncoding _OR_ they are valid UTF8 strings. (See PDFRef 1.7, 3.2.4). Leonard
In fact, I think that this is one of the reasons to have the UTF-8 built-in support in the library. I suppose that PDF Name and PDF String types in the `object library' will be based on the pdf_text_t from the `base library', which directly supports UTF-8.
Anyway, are you sure that these two encodings are the only ones allowed for PDF Names? In older Acrobat versions PDF Names could be encoded in specific `host encodings', like Shift-JIS or Big Five for Asian languages (PDFRef 1.7, H.3).
If this is the case, how can we detect the encoding being used in the PDF Name? For example, a PDF with a japanese encoding for PDF names which is read in a US-localized system... What the present text module API provides so far is a function to detect the best encoding for a given Unicode string, and not a function to detect the encoding being used in a given multibyte string. Something like this could also be needed, but I am not sure if this is possible to implement.
-- Aleksander
[Prev in Thread] | Current Thread | [Next in Thread] |